Mass spectrometry-cleavable cross-linking agents to facilitate structural analysis of proteins and protein complexes, and method of using same

ABSTRACT

Novel cross-linking compounds that can be used in mass spectrometry, tandem mass spectrometry, and multi-stage tandem mass spectrometry to facilitate structural analysis of proteins and protein complexes are provided and have the formula: 
     
       
         
         
             
             
         
       
     
     where X is an N-hydroxy-succinimidyl or similar heterocyclic group. Also provided is a method of mapping protein-protein interactions of protein complexes using various mass spectrometry techniques.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation application of U.S. patent application Ser. No. 13/471,365, filed May 14, 2012, and issued as U.S. Pat. No. 9,222,943 on Dec. 29, 2015, which was based on U.S. provisional patent application No. 61/486,260, filed May 14, 2011, the entire contents of which are incorporated by reference herein.

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT

This invention was made with Government support under Grant No. GM074830, awarded by the National Institute of Health. The Government has certain rights in this invention.

SEQUENCE LISTING IN ELECTRONIC FORMAT

A Sequence Listing in electronic format is provided as a file entitled UCI012_001C1_SEQLIST.TXT which is 32,341 bytes in size, which was created on May 10, 2016, and which was last modified on May 10, 2016.

FIELD OF THE INVENTION

The invention relates to the field of cross-linking agents and, more specifically, MS-cleavable cross-linkers that are diester derivatives of 3,3′-sulfinylbispropanoic acid, and the use of such compounds to facilitate structural analysis of proteins and protein complexes.

BACKGROUND OF THE INVENTION

Knowledge of elaborate structures of protein complexes is fundamental for understanding their functions and regulations. Although cross-linking coupled with mass spectrometry (MS) has been presented as a feasible strategy for structural elucidation of large multi-subunit protein complexes, this method has proven challenging due to technical difficulties in unambiguous identification of cross-linked peptides and determination of cross-linked sites by MS analysis.

Proteins form stable and dynamic multi-subunit complexes under different physiological conditions to maintain cell viability and normal cell homeostasis. Detailed knowledge of protein interactions and protein complex structures is fundamental to understanding how individual proteins function within a complex and how the complex functions as a whole. However, structural elucidation of large multi-subunit protein complexes has been difficult due to lack of technologies which can effectively handle their dynamic and heterogeneous nature. Traditional methods such as nuclear magnetic resonance (NMR) analysis and X-ray crystallography can yield detailed information on protein structures; however, NMR spectroscopy requires large quantities of pure protein in a specific solvent while X-ray crystallography is often limited by the crystallization process.

In recent years, chemical cross-linking coupled with mass spectrometry (MS) has become a powerful method for studying protein interactions. See for example the disclosures of Sinz, A. (2003) Chemical Cross-Linking and Mass Spectrometry for Mapping Three-Dimensional Structures of Proteins and Protein Complexes. J Mass Spectrom. 38, 1225-1237; Sinz, A. (2006) Chemical Cross-Linking and Mass Spectrometry to Map Three-Dimensional Protein Structures and Protein-Protein Interactions. Mass Spectrom Rev 25, 663-682; and Leitner, A., Walzthoeni, T., Kahraman, A., Herzog, F., Rinner, O., Beck, M., and Aebersold, R. (2010) Probing Native Protein Structures by Chemical Cross-Linking, Mass Spectrometry and Bioinformatics. Molecular & Cellular Proteomics 9, 1634-1649. Chemical cross-linking stabilizes protein interactions through the formation of covalent bonds and allows the detection of stable, weak and/or transient protein-protein interactions in native cells or tissues See for example the disclosures of Sinz, A. (2010) Investigation of Protein-Protein Interactions in Living Cells by Chemical Crosslinking and Mass Spectrometry. Anal Bioanal Chem 397, 3433-3440; Vasilescu, J., Guo, X., and Kast, J. (2004) Identification of Protein-Protein Interactions Using in Vivo Cross-Linking and Mass Spectrometry. Proteomics 4, 3845-3854; Guerrero, C., Tagwerker, C., Kaiser, P., and Huang, L. (2006) An Integrated Mass Spectrometry-Based Proteomic Approach: Quantitative Analysis of Tandem Affinity-Purified in Vivo Cross-Linked Protein Complexes (Qtax) to Decipher the 26 S Proteasome-Interacting Network. Mol Cell Proteomics 5, 366-378; Tagwerker, C., Flick, K., Cui, M., Guerrero, C., Dou, Y., Auer, B., Baldi, P., Huang, L., and Kaiser, P. (2006) A Tandem Affinity Tag for Two-Step Purification under Fully Denaturing Conditions: Application in Ubiquitin Profiling and Protein Complex Identification Combined with in Vivocross-Linking. Mol Cell Proteomics 5, 737-748; Guerrero, C., Milenkovic, T., Przulj, N., Kaiser, P., and Huang, L. (2008) Characterization of the Proteasome Interaction Network Using a Qtax-Based Tag-Team Strategy and Protein Interaction Network Analysis. Proc Natl Acad Sci USA 105, 13333-13338; and Kaake, R. M., Milenkovic, T., Przulj, N., Kaiser, P., and Huang, L. (2010) Characterization of Cell Cycle Specific Protein Interaction Networks of the Yeast 26s Proteasome Complex by the Qtax Strategy. J Proteome Res 9, 2016-2019. In addition to capturing protein interacting partners, many studies have shown that chemical cross-linking can yield low-resolution structural information about the constraints within a molecule. See for example the disclosures of Sinz, A. (2006) Chemical Cross-Linking and Mass Spectrometry to Map Three-Dimensional Protein Structures and Protein-Protein Interactions. Mass Spectrom Rev 25, 663-682; Leitner, A., Walzthoeni, T., Kahraman, A., Herzog, F., Rinner, O., Beck, M., and Aebersold, R. (2010) Probing Native Protein Structures by Chemical Cross-Linking, Mass Spectrometry and Bioinformatics. Molecular & Cellular Proteomics 9, 1634-1649; and Back, J. W., de Jong, L., Muijsers, A. O., and de Koster, C. G. (2003) Chemical Cross-Linking and Mass Spectrometry for Protein Structural Modeling. J Mol Biol. 331, 303-313, or protein complex, as disclosed in Rappsilber, J., Siniossoglou, S., Hurt, E. C., and Mann, M. (2000) A Generic Strategy to Analyze the Spatial Organization of Multi-Protein Complexes by Cross-Linking and Mass Spectrometry. Anal Chem. 72, 267-275; Maiolica, A., Cittaro, D., Borsotti, D., Sennels, L., Ciferri, C., Tarricone, C., Musacchio, A., and Rappsilber, J. (2007) Structural Analysis of Multiprotein Complexes by Cross-Linking, Mass Spectrometry, and Database Searching. Mol Cell Proteomics 6, 2200-2211; and Chen, Z. A., Jawhari, A., Fischer, L., Buchen, C., Tahir, S., Kamenski, T., Rasmussen, M., Lariviere, L., Bukowski-Wills, J. C., Nilges, M., Cramer, P., and Rappsilber, J. (2010) Architecture of the Rna Polymerase Ii-Tfiif Complex Revealed by Cross-Linking and Mass Spectrometry. Embo J 29, 717-726. The application of chemical cross-linking, enzymatic digestion, and subsequent mass spectrometric and computational analysis for the elucidation of three dimensional protein structures offers distinct advantages over traditional methods due to its speed, sensitivity, and versatility. Identification of cross-linked peptides provides distance constraints that aid in constructing the structural topology of proteins and/or protein complexes. Although this approach has been successful, effective detection and accurate identification of cross-linked peptides as well as unambiguous assignment of cross-linked sites remain extremely challenging due to their low abundance and complicated fragmentation behavior in MS analysis. See for the example the disclosures of Sinz, A. (2006) Chemical Cross-Linking and Mass Spectrometry to Map Three-Dimensional Protein Structures and Protein-Protein Interactions. Mass Spectrom Rev 25, 663-682; Leitner, A., Walzthoeni, T., Kahraman, A., Herzog, F., Rinner, O., Beck, M., and Aebersold, R. (2010) Probing Native Protein Structures by Chemical Cross-Linking, Mass Spectrometry and Bioinformatics. Molecular & Cellular Proteomics 9, 1634-1649; Back, J. W., de Jong, L., Muijsers, A. O., and de Koster, C. G. (2003) Chemical Cross-Linking and Mass Spectrometry for Protein Structural Modeling. J Mol Biol. 331, 303-313; and Schilling, B., Row, R. H., Gibson, B. W., Guo, X., and Young, M. M. (2003) Ms2assign, Automated Assignment and Nomenclature of Tandem Mass Spectra of Chemically Crosslinked Peptides. J Am Soc Mass Spectrom. 14, 834-850. Therefore, new reagents and methods are urgently needed to allow unambiguous identification of cross-linked products and to improve the speed and accuracy of data analysis to facilitate its application in structural elucidation of large protein complexes.

A number of approaches have been developed to facilitate MS detection of low abundance cross-linked peptides from complex mixtures. These include selective enrichment using affinity purification with biotinylated cross-linkers, for example, as described in Trester-Zedlitz, M., Kamada, K., Burley, S. K., Fenyo, D., Chait, B. T., and Muir, T. W. (2003) A Modular Cross-Linking Approach for Exploring Protein Interactions. J Am Chem Soc. 125, 2416-2425; Tang, X., Munske, G. R., Siems, W. F., and Bruce, J. E. (2005) Mass Spectrometry Identifiable Cross-Linking Strategy for Studying Protein-Protein Interactions. Anal Chem 77, 311-318; and Chu, F., Mahrus, S., Craik, C. S., and Burlingame, A. L. (2006) Isotope-Coded and Affinity-Tagged Cross-Linking (Icatxl): An Efficient Strategy to Probe Protein Interaction Surfaces. J Am Chem Soc 128, 10362-10363, and click chemistry with alkyne-tagged (Chowdhury, S. M., Du, X., Tolic, N., Wu, S., Moore, R. J., Mayer, M. U., Smith, R. D., and Adkins, J. N. (2009) Identification of Cross-Linked Peptides after Click-Based Enrichment Using Sequential Collision-Induced Dissociation and Electron Transfer Dissociation Tandem Mass Spectrometry. Anal Chem 81, 5524-5532) or azide tagged cross-linkers, see for example Kasper, P. T., Back, J. W., Vitale, M., Hartog, A. F., Roseboom, W., de Koning, L. J., van Maarseveen, J. H., Muijsers, A. O., de Koster, C. G., and de Jong, L. (2007) An Aptly Positioned Azido Group in the Spacer of a Protein Cross-Linker for Facile Mapping of Lysines in Close Proximity. Chembiochem 8, 1281-1292; and Nessen, M. A., Kramer, G., Back, J., Baskin, J. M., Smeenk, L. E., de Koning, L. J., van Maarseveen, J. H., de Jong, L., Bertozzi, C. R., Hiemstra, H., and de Koster, C. G. (2009) Selective Enrichment of Azide-Containing Peptides from Complex Mixtures. J Proteome Res 8, 3702-3711. In addition, Staudinger ligation has recently been shown to be effective for selective enrichment of azide-tagged cross-linked peptides (Vellucci, D., Kao, A., Kaake, R. M., Rychnovsky, S. D., and Huang, L. (2010) Selective Enrichment and Identification of Azide-Tagged Cross-Linked Peptides Using Chemical Ligation and Mass Spectrometry. J Am Soc Mass Spectrom 21, 1432-1445). Apart from enrichment, detection of cross-linked peptides can be achieved by isotope-labeled, as described in Collins, C. J., Schilling, B., Young, M., Dollinger, G., and Guy, R. K. (2003) Isotopically Labeled Crosslinking Reagents: Resolution of Mass Degeneracy in the Identification of Crosslinked Peptides. Bioorg Med Chem Lett. 13, 4023-4026; Petrotchenko, E. V., Olkhovik, V. K., and Borchers, C. H. (2005) Isotopically Coded Cleavable Cross-Linker for Studying Protein-Protein Interaction and Protein Complexes. Mol Cell Proteomics 4, 1167-1179; and Petrotchenko, E., and Borchers, C. (2010) Icc-Class: Isotopically-Coded Cleavable Crosslinking Analysis Software Suite. BMC bioinformatics 11, 64, fluorescently labeled (Sinz, A., and Wang, K. (2004) Mapping Spatial Proximities of Sulfhydryl Groups in Proteins Using a Fluorogenic Cross-Linker and Mass Spectrometry. Anal Biochem. 331, 27-32), and mass-tag labeled cross-linking reagents, for example as described in Tang, X., Munske, G. R., Siems, W. F., and Bruce, J. E. (2005) Mass Spectrometry Identifiable Cross-Linking Strategy for Studying Protein-Protein Interactions. Anal Chem 77, 311-318; and Back, J. W., Hartog, A. F., Dekker, H. L., Muijsers, A. O., de Koning, L. J., and de Jong, L. (2001) A New Crosslinker for Mass Spectrometric Analysis of the Quaternary Structure of Protein Complexes. J. Am. Soc. Mass Spectrom. 12, 222-227. These methods can identify cross-linked peptides with MS analysis, but interpretation of the data generated from inter-linked peptides (two peptides connected with the cross-link) by automated database searching remains difficult. Several bioinformatics tools have thus been developed to interpret MS/MS data and determine inter-linked peptide sequences from complex mixtures, as described in Maiolica, A. et al.; Schilling, B. et al.; Chu, F., Baker, P. R., Burlingame, A. L., and Chalkley, R. J. (2009) Finding Chimeras: A Bioinformatic Strategy for Identification of Cross-Linked Peptides. Mol Cell Proteomics 9, 25-31; Gao, Q., Xue, S., Shaffer, S. A., Doneanu, C. E., Goodlett, D. R., and Nelson, S. D. (2008) Minimize the Detection of False Positives by the Software Program Detectshift for 18o-Labeled Cross-Linked Peptide Analysis. Eur J Mass Spectrom (Chichester, Eng) 14, 275-280; Singh, P., Shaffer, S. A., Scherl, A., Holman, C., Pfuetzner, R. A., Larson Freeman, T. J., Miller, S. I., Hernandez, P., Appel, R. D., and Goodlett, D. R. (2008) Characterization of Protein Cross-Links Via Mass Spectrometry and an Open-Modification Search Strategy. Anal Chem 80, 8799-8806; Rinner, O., Seebacher, J., Walzthoeni, T., Mueller, L. N., Beck, M., Schmidt, A., Mueller, M., and Aebersold, R. (2008) Identification of Cross-Linked Peptides from Large Sequence Databases. Nat Methods 5, 315-318; Lee, Y. J., Lackner, L. L., Nunnari, J. M., and Phinney, B. S. (2007) Shotgun Cross-Linking Analysis for Studying Quaternary and Tertiary Protein Structures. J Proteome Res 6, 3908-3917; and Nadeau, O. W., Wyckoff, G. J., Paschall, J. E., Artigues, A., Sage, J., Villar, M. T., and Carlson, G. M. (2008) Crosssearch, a User-Friendly Search Engine for Detecting Chemically Cross-Linked Peptides in Conjugated Proteins. Mol Cell Proteomics 7, 739-749. Although promising, further developments are still needed to make such data analyses as robust and reliable as analyzing MS/MS data of single peptide sequences using existing database searching tools (e.g. Protein Prospector, Mascot or SEQUEST).

Various types of cleavable cross-linkers with distinct chemical properties have been developed to facilitate MS identification and characterization of cross-linked peptides. These include UV photocleavable (Nadeau, O. W., Wyckoff, G. J., Paschall, J. E., Artigues, A., Sage, J., Villar, M. T., and Carlson, G. M. (2008) Crosssearch, a User-Friendly Search Engine for Detecting Chemically Cross-Linked Peptides in Conjugated Proteins. Mol Cell Proteomics 7, 739-749), chemical cleavable (Kasper, P. T., et al.), isotopically-coded cleavable (Petrotchenko, E. V., et al.), and MS-cleavable reagents, as described in Tang, X, et. al.; Back, J. W., et. al.; Zhang, H., Tang, X., Munske, G. R., Tolic, N., Anderson, G. A., and Bruce, J. E. (2009) Identification of Protein-Protein Interactions and Topologies in Living Cells with Chemical Cross-Linking and Mass Spectrometry. Mol Cell Proteomics 8, 409-420; Soderblom, E. J., and Goshe, M. B. (2006) Collision-Induced Dissociative Chemical Cross-Linking Reagents and Methodology: Applications to Protein Structural Characterization Using Tandem Mass Spectrometry Analysis. Anal Chem 78, 8059-8068; Soderblom, E. J., Bobay, B. G., Cavanagh, J., and Goshe, M. B. (2007) Tandem Mass Spectrometry Acquisition Approaches to Enhance Identification of Protein-Protein Interactions Using Low-Energy Collision-Induced Dissociative Chemical Crosslinking Reagents. Rapid Commun Mass Spectrom 21, 3395-3408; Lu, Y., Tanasova, M., Borhan, B., and Reid, G. E. (2008) Ionic Reagent for Controlling the Gas-Phase Fragmentation Reactions of Cross-Linked Peptides. Anal Chem 80, 9279-9287; and Gardner, M. W., Vasicek, L. A., Shabbir, S., Anslyn, E. V., and Brodbelt, J. S. (2008) Chromogenic Cross-Linker for the Characterization of Protein Structure by Infrared Multiphoton Dissociation Mass Spectrometry. Anal Chem 80, 4807-4819. MS-cleavable cross-linkers have received considerable attention since the resulting cross-linked products can be identified based on their characteristic fragmentation behavior observed during MS analysis. Gas-phase cleavage sites result in the detection of a “reporter” ion (Back, J. W., et al.), single peptide chain fragment ions (Soderblom, E. J., and Goshe; Soderblom, E. J., Bobay, B. G., et al.; Lu, Y., et al. and Gardner, M. W. et al.), or both reporter and fragment ions (Tang, X., et al.; and Zhang, H. et. al.). In each case, further structural characterization of the peptide product ions generated during the cleavage reaction can be accomplished by subsequent MS^(n1) analysis. Among these linkers, the “fixed charge” sulfonium ion containing cross-linker developed by Lu. et. al appears to be the most attractive as it allows specific and selective fragmentation of cross-linked peptides regardless of their charge and amino acid composition based on their studies with model peptides.

Despite the availability of multiple types of cleavable cross-linkers, most of the applications have been limited to the study of model peptides and single proteins. Additionally, complicated synthesis and fragmentation patterns have impeded most of the known MS-cleavable cross-linkers from wide adaptation by the community.

SUMMARY OF THE INVENTION

The present invention provides novel cross-linking compounds that can be coupled with multi-stage tandem mass spectrometry (MS^(n)) to facilitate structural analysis of proteins and protein complexes. In a first aspect of the invention, a new crosslinking compound is provided and has the formula:

where x is selected from the group consisting of

wherein R is methyl or ethyl, and

Compounds of the general formula shown above are symmetric diester derivatives of 3,3′-sulfinylbispropanoic acid, also known as 3,3′-sulfinyldipropanoic acid, C₆H₁₀O₅S. Like the diacid, the diesters have two symmetric collision-induced dissociation (CID)-cleavable sites that allow effective identification of diestercross-linked peptides based on their distinct fragmentation patterns unique to cross-linking types (i.e. inter-link, intra-link, and dead-end).

In a second aspect of the invention, the new cross-linking agents are used to facilitate mapping of protein-protein interactions of protein complexes. In one embodiment, the method comprises the steps of providing a MS-cleavable cross-linker having the formula described above; forming a cross-linked protein complex by cross-linking proteins with the MS-cleavable cross-linker; forming protein and/or peptide fragments that are chemically bound to the MS-cleavable cross-linker by digesting the cross-linked protein complex with an enzyme such as trypsin; and using mass spectrometry (MS) and MS^(n) analysis to identify the protein and/or peptide fragments.

In another aspect of the invention, a method for integrated data analysis work flow for identification of cross-linked peptides is provided and comprises the steps of providing cross-linked peptides, each cross-linked peptide comprising an MS-cleavable cross-linker as described above; performing mass spectrometry on the cross-linked peptides to obtain MS data, MS/MS data, and MS³ data; identifying the MS/MS data comprising characteristic fragmentation profiles of MS-cleavable cross-linker-containing cross-linked peptides to obtain an MS/MS result comprising a list of parent ions corresponding to cross-linked peptide candidates; peptide sequencing the cross-linked peptides using the MS³ data to obtain an MS³ result comprising identities of cleaved cross-linked peptide fragments generated during MS/MS analysis; mass mapping the MS data using the list of parent ions corresponding to the cross-linked peptide candidates against a database comprising known protein sequences and the MS-cleavable cross-linker to obtain an MS result comprising possible cross-linked peptide sequences based on theoretical masses; and integrating the MS result, the MS/MS result, and MS³ result to identify cross-linked peptides.

BRIEF DESCRIPTION OF THE DRAWINGS

The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.

FIG. 1 shows exemplary Compounds 1 and 3-9 and General Structure 2 according to the invention.

FIG. 2 shows proposed fragmentation schemes of DSSO-cross-linked peptides. A, DSSO synthesis and structure. B-D, MS/MS fragmentation patterns of the three types of DSSO-cross-linked peptides: interlinked (B), dead end (C), and intralinked (D). E, conversion of a sulfenic acid-modified fragment to an unsaturated thiol-modified fragment after a water loss. F, mass relationships between MS/MS fragment ions shown in B-D and their parent ions. DCC, N,N′-dicyclohexylcarbodiimide; MCPBA, m-chloroperbenzoic acid.

FIG. 3 is an exemplary MS^(n) analysis of DSSO-cross-linked model peptides. A-E, MS^(n) analysis of the DSSO-interlinked Ac-IR7 (α-α). A, MS spectrum of α-α: [α-α]³⁺ (m/z 615.97³⁺) and [α-α]²⁺ (m/z 923.46²⁺). B and C, MS/MS spectra of [α-α]³⁺ (B) and [α-α]²⁺ (C) in which alkene (α_(A)) and sulfenic acid (α_(S)) fragments were detected. D and E, MS³ spectra of α_(A) (m/z 449.66²⁺) (D) and α_(S) (m/z 948.43) (E). F-I, MS^(n) analysis of DSSO-interlinked Ac-myelin (β-β). F, MS spectrum of β-β: [β-β]⁶⁺ (m/z 458.23⁶⁺), [β-β]⁵⁺ (m/z 549.68⁵⁺), and [β-β]⁴⁺ (m/z 686.84⁴⁺). G-I, MS/MS spectra of [β-β]⁶⁺ in which β_(A)/β_(T) pair was observed (G), [β-β]⁵⁺ in which the β_(A)/β_(S) pair was observed (H), and [β-β]⁴⁺ in which β_(A)/β_(s) pair was observed (I). J-L, MS^(n) analysis of DSSO dead end-modified substance P peptide γ_(DN). J, MS spectrum of γ_(DN) (m/z 538.76²⁺). K, MS/MS spectrum of γ_(DN) in which two fragments, γ_(A) (m/z 478.03²⁺) and γ_(s) (m/z 502.95²⁺), were detected. L, MS³ spectrum of γ_(A) (m/z 478.03²⁺). Sequences of Ac-IR7, Ac-myelin, and substance P are Ac-IEAEKGR (SEQ ID NO: 2), Ac-ASQKRPSQRHG (SEQ ID NO: 6), and RPKPQQF (SEQ ID NO: 7), respectively.

FIG. 4 is an exemplary MS^(n) analysis of DSSO heterodimeric interlinked peptide of cytochrome c (α-β: Ac-GDVEKGKK (SEQ ID NO: 11) interlinked to KKGER (SEQ ID NO: 13)). A, MS/MS spectrum of [α-β]⁴⁺ (m/z 419.9716⁴⁺) in which two fragment pairs were observed: α_(A) (m/z 478.99²⁺)/β_(T) (m/z 352.40²⁺) and α_(T) (m/z 494.96²⁺/β_(A) (m/z 336.42²⁺). B, MS³ spectrum of α_(A) (m/z 478.99²⁺) in which detection of y₁-y₇ and b₂-b₇ determined the sequence unambiguously as Ac-GDVEK_(A)GKK (SEQ ID NO: 12). C, MS³ spectrum of β_(T) (m/z 352.40²⁺) in which detection of y₁-y₄, al, and b₂-b₇ ions determined the sequence unambiguously as K_(T)KGER (SEQ ID NO: 14). K_(A) is modified with the alkene moiety, and K_(T) is modified with the unsaturated thiol moiety.

FIG. 5 is an exemplary MS^(n) analysis of DSSO heterodimeric interlinked peptide of cytochrome c (α-β: HKTGPNLHGLFGR (SEQ ID NO: 16) interlinked to GKK). This peptide was detected in MS as triply charged [α-β]³⁺ (m/z 641.6730³⁺), quadruply charged [α-β]⁴⁺ (m/z 481.5069⁴⁺), and quintuply charged [α-β]⁵⁺ (m/z 385.4070⁵⁺) ions. A, MS/MS spectrum of [α-β]³⁺ (m/z 641.6730³⁺) in which two fragment pairs were observed: α_(A) (m/z 744.40²⁺)/β_(T) (m/z 418.21) and α_(T) (m/z 760.38²⁺)/β_(A) (m/z 386.24). B, MS/MS spectrum of [α-β]⁴⁺ (m/z 481.5069⁴⁺) in which two fragment pairs were observed: α_(A) (m/z 496.60³⁺)/β_(T) (m/z 418.21) and α_(T) (m/z 507.26³⁺)/β_(A) (m/z 386.24). C, MS/MS spectrum of [α-β]⁵⁺ (m/z 385.4070⁵⁺) in which two fragment pairs were observed: α_(A)/β_(T) (m/z 496.60³⁺/209.61²⁺ and 372.70⁴⁺/418.21) and α_(T) (m/z 507.26³⁺)/β_(A) (m/z 193.62²⁺). D, MS³ spectrum of α_(A) fragment (m/z 496.60³⁺) in which detection of a series of y and b ions determined its sequence unambiguously as HK_(A)TGPNLHGLFGR (SEQ ID NO: 17). K_(A) is modified with the alkene moiety.

FIG. 6 is an exemplary MS^(n) analysis of DSSO dead end-modified peptide (A and B) and intralinked peptide of cytochrome c (C and D). A, MS/MS spectrum of a dead end-modified peptide (α_(DN); m/z 880.8975²⁺, K_(DN)TGQAPGFSYTDANK (SEQ ID NO: 20)) in which two fragment ions were determined as α_(A) (m/z 820.20²⁺) and α_(T) (m/z 835.88²⁺). B, MS³ spectrum of α_(A) (m/z 820.20²⁺) in which detection of a series of y and b ions determined its sequence unambiguously as K_(A)TGQAPGFSYTDANK (SEQ ID NO: 21). C, MS/MS spectrum of an intralinked peptide (α_(intra); m/z 611.9802³⁺, GGK*HK*TGPNLHGLFGR (SEQ ID NO: 24)) in which one fragment ion was observed and determined as α_(A+T) (m/z 606.24³⁺). D, MS³ spectrum of α_(A+T) (m/z 606.24³⁺) in which detection of a series of y and b ions determined the presence of a mixture of GGK_(A)HK_(T)TGPNLHGLFGR (SEQ ID NO: 25) and GGK_(T)HK_(A)TGPNLHGLFGR (SEQ ID NO: 26). K_(A) is modified with the alkene moiety, and K_(T) is modified with the unsaturated thiol moiety.

FIG. 7 shows A, the integrated data analysis work flow for identifying DSSO-crosslinked peptides by LC MS^(n) and B, the work flow for the Link-Finder program.

FIG. 8 is an exemplary MS^(n) analysis of DSSO heterodimeric interlinked peptide of the yeast 20 S proteasome complex (α-β: NKPELYQIDYLGTK (SEQ ID NO: 27) interlinked to LGSQSLGVSNKFEK (SEQ ID NO: 29)) with intersubunit link between 20 S subunit β4 and β3. A, MS/MS spectrum of [α-β]⁴⁺ (m/z 833.9231⁴⁺) in which two fragment pairs were detected and determined as α_(A) (m/z 868.52²⁺)/β_(T) (m/z 790.55²⁺) and α_(T) (m/z 884.98²⁺)/β_(A) (m/z 774.32²⁺). B, MS³ spectrum of α_(A) (m/z 868.52²⁺) in which detection of a series of y and b ions determined its sequence unambiguously as NK_(A)PELYQIDYLGTK (SEQ ID NO: 28). C, MS³ spectrum of β_(T) (m/z 790.55²⁺) in which detection of a series of y and b ions determined its sequence unambiguously as LGSQSLGVSNK_(T)FEK (SEQ ID NO: 30). K_(A) is modified with the alkene moiety, and K_(T) is modified with the unsaturated thiol moiety.

FIG. 9 shows a mapping identified DSSO-interlinked lysines onto crystal structure of yeast 20 S proteasome. The lysines forming intrasubunit cross-links appear space-filled in blue, and those forming intersubunit cross-links appear space-filled in red.

FIG. 10 is a flowchart showing a general technique for identifying crosslinked peptides according to one embodiment of the invention.

FIG. 11A is an exemplary MS³ analysis of DSSO inter-linked peptides of cytochrome c.

FIG. 11B is an exemplary MS³ analysis of ubiquitin.

FIG. 12 is an exemplary SDS-PAGE gel picture of the 20S proteasome cross-linked with various molar ratios of cross-linker DSSO, i.e. 1:100, 1:500 and 1:1000. The 20S proteasome without cross-linking served as a control. The cross-linked proteasome complex was separated using 4-20% gradient gel.

FIG. 13 is an exemplary MS³ analysis of DSSO inter-linked peptides of the yeast 20S proteasome complex.

FIG. 14 is an exemplary MS^(n) analysis of a DSSO dead-end peptide of the yeast 20S proteasome complex. A) MS/MS spectrum of a dead-end (DN) peptide (α_(DN), m/z 693.0078³⁺, AELEK_(DN)LVDHHPEGLSAR (SEQ ID NO: 110)), in which two fragment ions were determined as αA (m/z 652.67³⁺) and α_(T) (m/z 663.33³⁺); B) MS³ spectrum of α_(A) (m/z 652.67³⁺), detection of a series of y and b ions determined its sequence unambiguously as AELEK_(A)LVDHHPEGLSAR (SEQ ID NO: 111), in which K_(A) is modified with the alkene moiety. The sequence matched to subunit α7; C) MS3 spectrum of αT (m/z 663.33³⁺), detection of a series of y and b ions determined its sequence unambiguously as AELEK_(T)LVDHHPEGLSAR (SEQ ID NO: 112), in which K_(T) is modified with the unsaturated thiol moiety.

DESCRIPTION OF THE TABLES

TABLE 1 Summary of DSSO-interlinked peptides of cytochrome c identified by LC MS^(n).

TABLE 2 Summary of DSSO-interlinked peptides of the yeast 20 S proteasome complex identified by LC MS^(n).

TABLE 3 Summary of DSSO cross-linked peptides—DSSO dead-end, intra-linked and multilinked peptides—of cytochrome c by LC MS^(n).

TABLE 4 Summary of DSSO cross-linked peptides of ubiquitin by LC MS^(n).

TABLE 5 Summary of DSSO inter-linked and dead-end peptides of the yeast 20S proteasome complex by LC MS^(n).

TABLE 6 Peptide sequences with their corresponding SED ID NOs.

TABLE 1 m/z Peptide AA MS m/z Δ Mod. sequenced Distance Type Sequence Location (Observed) z (PPM) Position in MS3 z (Cα-Cα) References 2 Ac-GDVEKGK G1-K7  565.30 3 1 K_(T)5  860.38 1 5.3 A 19, 20, 21, KIFVQK K8-K13 K_(A)8  408.75 2 31 2 Ac-GDVEKGK G1-K7  603.81 2 0 K_(A)5  828.41 1 13.0 A 21, 31, 43 KK K87-K88 K87*  860.38 2 Ac-GDVEKGK G1-K7  516.93 3 0 K_(T)5  660.38 1 13.0 A 21, 31 KKGER K87-R91 K_(A)87  336.20 2 2 Ac-GDVEKGK G1-K7  474.23 3 2 K_(A)5  414.71 2 13.0 A N/A KGER K88-R91 K88* 2 Ac-GDVEKGK G1-K7  675.35 3 4 K_(T)5  860.38 1 13.2 A N/A EDLIAYLKK E92-K100 K_(A)99  573.83 2 2 Ac-GDVEKGKK G1-K8  445.57 3 1 K_(A)7  478.76 2 15.7 A 21, 31 KK K87-K88 K87* 2 Ac-GDVEKGKK G1-K8  419.97 4 0 K_(A)7  478.76 2 15.7 A 21, 31 KKGER K87-K91 K_(T)87 352.418 2 2 GKK G6-K8  641.67 3 0 K7*  760.39 2 18.7 A 14, 31, 43 HKTGPNLHGLFGR H26-R38 K_(T)27 2 GKK G6-K8  526.26 2 0 K7*  616.29 1 9.9 A 21, 43 KATNE K100-E104 K_(A)100 2 KIFVQK K8-K13  398.90 3 2 K_(T)8  424.74 2 14.8 A 31 KK K87-K88 K87* 2 KIFVQK K8-K13  384.97 4 2 K_(A)8  408.75 2 14.8 A 31 KKGER K87-R91 K_(T)87  352.18 2 2 KIFVQK K8-K13  494.59 3 2 K_(A)8  406.75 2 13.7 A 21, 31 KATNE K100-E104 K100* 2 GGKHK G23-K27  756.70 3 2 K_(T)25  612.29 2 19.3 A N/A KTGQAPGFSYTDANK K39-K53 K_(A)39  819.89 2 KTGQAPGFSYTDANK K39-K53  945.47 3 3 K_(A)39  819.89 2 15.1 A 31 EDLIAYLKK E92-K100 K_(T)99 1178.62 1 2 KTGQAPGFSYTDANK K39-K53  768.69 3 0 K_(T)99  835.88 2 18.0 A 21, 31, 43 KATNE K100-E104 K100* 2 TGQAPGFSYTDANKNK T40-K55 1104.21 3 2 K_(T)53  892.90 2 11.6 A 31 YIPGTKMoxIFAGIK Y74-K86 K_(A)79 1508.82 1 2 KYIPGTK K73-K79  629.68 3 2 K_(T)73^(‡)  892.43 1 13.2 A 31 MoxIFAGIKK M80-K87 K_(T)86^(‡) 1009.52 1 2 MIFAGIKK M80-K87  389.21 4 2 K_(T)86  497.27 2 6.4 A 31 KGER K88-R91 K88* 2 MoxIFGIKK M80-K87  393.21 4 2 K_(T)86  505.27 2 6.4 A 31 KGER K88-R91 K88* *They were identified from different pair ions by MS³. ^(‡)They were identifed from different fragment pair ions by MS³. Note: All of the inter-linked peptides displayed characteristic fragment pairs and were identified by Batch-Tag, MS-Bridge, and Link-Finder.

TABLE 2 m/z Peptide AA MS m/z Δ Mod. sequenced Distance Type Sequence Subunit Location (Observed) z (PPM) Position in MS3 z (Cα-Cα) 2 ATATGPKQQEITTNLENHFK α1  A168-K187 595.10 5 2 K_(A)174  571.29 4 14.8 Å (PRS2/SCL1) KVPDK α1  K58-K62 K_(T)58  672.34 1 (PRS2/SCL1) 2 KVAHTSYK α2 (PRE8) K91-K98 477.51 4 2 K_(T)91  510.25 2  5.1 Å VLVDKSR α2 (PRE8) V84-R90 K_(A)88  435.76 2 2 IFKPQEIK α3 (PRE9) I229-K236 514.03 4 0 K_(T)231  544.80 2 14.2 Å LYKLNDK α3 (PRE9) L66-K72 K_(A)68  474.26 2 2 IHAQNYLKTYNEDIPVEILVR α3 (PRE9) I93-R113 904.47 4 1 K_(T)100 1307.58 2 10.6 Å YKTNLYK β3 (PUP3) Y69-K75 K_(A)70  492.27 2 2 EFLKNYDR α4 (PRE6) E173-R181 692.33 3 2 K_(A)177^(‡)  634.30 2 13.1 Å NSKTVR α4 (PRE6) N167-R172 K_(A)169^(‡)  379.71 2 2 ILKQVMEEK α5 (PUP2) I203-K211 641.01 3 0 K_(T)205  602.31 2 10.5 Å ELEK α5 (PUP2) E242-K246 K244* 2 SYKFPR β2 (PUP1)^(†) S202-R207 539.26 3 1 K_(A)204  426.23 2 12.1 Å EEKQK β2 (PUP1)^(†) E197-K201 K_(T)199  747.34 1 2 YKTNLYK β3 (PUP3) Y69-K42 587.64 3 2 K_(A)70^(‡)  492.26 2 10.7 Å LKEER β3 (PUP3) Y199-R203 K_(A)77^(‡)  364.70 2 2 LGSQSLGVSNKFEK β3 (PUP3) L29-K42 595.05 4 2 K_(T)39  790.40 2 13.2 Å YLKMoxR β3 (PUP3) Y199-R203 K_(A)201  390.71 2 2 NKPELYQIDYLGTK β4 (PRE1) N112-R203 833.92 4 0 K_(A)113  868.45 2 19.1 Å LGSQSLGVSNKFEK β3 (PUP3) L29-K42 K_(T)39  790.39 2 2 VQDSVILASSKAVTR β4 (PRE1) V9-R23 633.74 5 1 K_(A)19  543.30 3  7.8 Å GISVLKDSDDKTR β4 (PRE1) G24-R36 K_(T)29  460.38 2 2 FKNSVK β6 (PRE7)^(†) F59-K64 532.29 3 2 K_(T)60  808.40 1 16.2 Å KLAVER α6 (PRE5) K102-R107 K_(A)102  385.23 2 2 NQYEPGTNGKVK β6 (PRE7)^(†) N149-K160 659.68 3 0 K_(A)158  694.84 2  9.8 Å KPLK β6 (PRE7)^(†) K161-K164 K161* *Peptide fragments containing these sites were not sequenced by MS³. ^(‡)They were identified from different fragment pair ions by MS³. ^(†)Mature sequence from crystal data was used for data analysis. Note: All of the inter-linked peptides displayed characteristic fragment pairs and were identified by Batch-tag, MS-Bridge and Link-Finder.

TABLE 3 MS m/z Expect- Identified Peptide AA m/z Δ Mod. sequenced Peptide ation in other Type Sequence Location (Observed) z (PPM) Position in MS3 z Score Value Refs 0 Ac-GDVEKGKK G1-K8 539.76 2 1 K_(T)5 494.74 2 22.7 1.90E-05 21 (SEQ ID NO: 11) 0 KIFVQK K8-K13 469.76 2 2 K_(A)8 408.75 2 19.1 1.00E-04 19, 20, 21 (SEQ ID NO: 35) 31 0 KTGQAPGFSYTDANK K39-K53 880.90 2 2 K_(T)39 838.88 2 41.5 2.10E-10 19, 20, 21 (SEQ ID NO: 19) 41 0 TGQAPGFSYTDANKNK T40-K55 937.92 2 0 K_(T)53 892.90 2 28.8 4.60E-08 19, 31 (SEQ ID NO: 46) 0 KYIPGTK K73-K79 491.75 2 2 K_(A)73 430.75 2 23.9 1.40E-05 20, 21, 31 (SEQ ID NO: 51) 0 YIPGTKMoxIFAGIK y74-K86 815.92 2 2 K_(T)79 770.90 2 18.3 5.00E-06 19, 31 (SEQ ID NO: 49) 0 MoxIFAGIKK M80-K87 550.28 2 1 K_(T)86 505.27 2 22.0 4.20E-06 31 (SEQ ID NO: 54) 0 EDLIAYLKK E92-K100 634.83 2 1 K_(A)99 573.83 2 32.9 2.70E-07 21, 31 (SEQ ID NO: 39) MS m/z Expect- Identified Peptide AA m/z Δ Mod. sequenced Peptide ation Distance in other Type Sequence Location (Observed) z (PPM) Position in MS3 z Score Value (Cα-Cα) Refs 1 Ac-GDVEKGKK G1-K8 530.75 2 2 K_(A)5, K_(T)7 521.76 2 19.5 6.20E-05  5.4 Å 21 (SEQ ID NO: 11) 1 GGKHKTGPNLHGLFGR G23-R38 611.98 3 0 K_(A)25, K_(T)27 605.98 3 37.7 2.80E-08  6.3 Å 14, 19, 20, (SEQ ID NO: 23) 21, 31, 42 1 KYIPGTKMoxIFAGIK K73-K86 870.96 2 2 K73, K79* — — — 12.1 Å 31 (SEQ ID NO: 114) 1 KYIPGTKMoxIFAGIKK K73-K87 623.67 3 2 K73, K86* — — — 13.2 Å 31 (SEQ ID NO: 116) 1 MoxIFAGIKKK M80-K88 605.32 2 2 K_(A)86, K_(T)87 596.32 2 29.5 1.10E-08 — 14, 19, 20, (SEQ ID NO: 118) 21, 31, 42 1 KKGER K87-R91 388.19 2 1 K_(A)87, K_(S)88* — — — — 20, 21 (SEQ ID NO: 13) 1 EDLIAYLKKATNE E92-E104 833.41 2 3 K99, K_(T)100 824.40 2 28.7 1.50E-06 — 20, 21 (SEQ ID NO: 119) MS m/z Best Best Ex- Peptide AA m/z Δ Mod. sequenced Discovery pectation Distance Ref- Type Sequence Location (Observed) z (PPM) Position in MS3 z Score Value (Cα-Cα) erences 2 Ac-GDVEKGK G1-K7 565.30 3 1 K_(T)5 860.38 1 19.7 2.70E-05  5.3 Å 19, 20,  (SEQ ID NO: 32) 21, 31 KIFVQK K8-K13 K_(A)8 408.75 2 20.3 1.90E-05 (SEQ ID NO: 35) 2 Ac-GDVEKGK G1-K7 603.81 2 0 K_(A)5 828.41 1 23.1 2.70E-06 13.0 Å 21, 31,  (SEQ ID NO: 32) 43 KK K87-K88 K87* 2 Ac-GDVEKGK G1-K7 516.93 3 0 K_(T)5 860.38 1 19.7 2.70E-06 13.0 Å 21, 31 (SEQ ID NO: 32) KKGER K87-R91 K_(A)87 336.20 2 14.8 1.50E-04 (SEQ ID NO: 23) 2 Ac-GDVEKGK G1-K7 474.23 3 2 K_(A)5 414.71 2 25.5 8.60E-07 13.0 Å — (SEQ ID NO: 32) KGER K88-R91 k88* (SEQ ID NO: 38) 2 Ac-GDVEKGK G1-K7 675.35 3 4 K_(T)5 860.38 1 19.7 2.70E-05 13.2 Å — (SEQ ID NO: 32) EDLIAYLKK E92-K10 K_(A)99 573.83 2 32.9 2.10E-07 (SEQ ID NO: 30) 2 Ac-GDVEKGKK G1-K8 445.57 3 1 K_(A)7 478.76 2 23.1 7.50E-06 15.7 Å 21, 31 (SEQ ID NO: 11) KK K87-K88 K87* 2 Ac-GDVEKGKK G1-K8 419.97 4 0 K_(A)7 478.76 2 22.0 2.20E-05 15.7 Å 21, 31 (SEQ ID NO: 11) KKGER K87-K91 K_(T)87 352.18 2 15.5 1.40E-03 (SEQ ID NO: 13) 2 GKK G6-K8 641.67 3 0 K7* 18.7 Å 14, 31, HKTGPNLHGLFGR H26-R38 K_(T)27 760.39 2 35.0 7.10E-11 43 (SEQ ID NO: 16) 2 GKK G6-K8 526.26 2 0 KT*  9.9 Å 21, 43 KATNE K100-E104 K_(A)100 616.29 1 14.2 2.40E-09 (SEQ ID NO: 42) 2 KIFVQK K8-K13 398.90 3 2 KT8 424.74 2 19.4 1.40E-04 14.8 Å 31 (SEQ ID NO: 35) KK K87-K88 KT87* 2 KIFVQK K8-K13 384.97 4 2 KA8 408.75 2 20.3 1.90E-05 14.8 Å 31 (SEQ ID NO: 35) KKGER K87-K91 KT87 352.18 2 15.0 1.00E-04 (SEQ ID NO: 13) 2 KIFVQK K8-K13 494.59 3 2 KA8 408.75 2 20.6 3.20E-05 13.7 Å 21, 31 (SEQ ID NO: 35) KATNE K100-E104 K100* (SEQ ID NO: 42) 2 GGKHK G23-K27 756.70 3 2 KT25 612.29 1 9.0# 8.00E-03 19.3 Å — (SEQ ID NO: 44) KTGQAPGFSYTDANK K39-K53 KA39 819.89 2 44.7 5.70E-11 (SEQ ID NO: 19) 2 KTGQAPGFSYTDANK K39-K53 945.47 3 3 KA39 819.89 2 42.5 2.50E-10 15.1 Å 31 (SEQ ID NO: 19) EDLIAYLKK E92-K100 KT99 1178.62 1 22.9 1.80E-05 (SEQ ID NO: 39) 2 KTGQAPGFSYTDANK K39-K53 768.69 3 0 KT39 835.88 2 39.9 1.20E-09 18.0 Å 21, 31, (SEQ ID NO: 19) 43 KATNE K100-E104 K100* (SEQ ID NO: 42) 2 TGQAPGFSYTDANKNK T40-K55 1104.21 3 2 KT53 892.90 2 28.8 4.60E-08 11.6 Å 31 (SEQ ID NO: 46) YIPGTKMoxIFAGIK Y74-K86 KA79 1508.82 1 9.3# 1.00E-03 (SEQ ID NO: 49) 2 KYIPGTK K73-K79 629.68 3 2 KT73^(‡) 892.46 1 17.6 2.00E-05 13.2 Å 31 (SEQ ID NO: 51) MoxIFAGIKK M80-K87 KT86^(‡) 1009.52 1 15.0 2.10E-05 (SEQ ID NO: 54) 2 MIFAGIKK M80-K87 389.21 4 2 KT86 497.27 2 18.9 5.00E-05  6.4 Å 31 (SEQ ID NO: 53) KGER K88-R91 K88* 505.27 (SEQ ID NO: 38) 2 MoxIFAGIKK M80-K87 393.21 4 2 KT86 2 24.0 4.20E-07  6.4 Å 31 (SEQ ID NO: 54) KGER K88-R91 K88* MS m/z Expect- Dis- Identified Peptide AA m/z Δ Mod. sequenced Peptide ation tance In Other Type Sequence Location (Observed) z (PPM) Position in MS3 z Score Value (Cα-Cα) Refs 0,0 GGKHKTGPNLGHLFGR G23-R38 507.74 4 -2 K_(A)28, K_(A)27 446.74 4 28.0 1.10E-06 — — (SEQ ID NO: 23) 0,1 YIPGTKMoxIFAGIKKK Y74-588 682.34 3 1 K_(A)79, K_(A)86,  635.67 3 24.6 3.60E-05 — — (SEQ ID NO: 121) K_(T)87 0,1 MoxIFAGIKKKGER M80-R91 576.61 3 2 K_(A)86, K_(A)87, 529.94 3 31.8 1.20E-05 — — (SEQ ID NO: 123) K_(T)88 0,1 MoxIFAGIKKKGER M80-R91 864.41 2 1 KT86, K_(A)87, 794.41 2 34.0 2.00E-08 — — (SEQ ID NO: 323) K_(A)88 0,2 Ac-GDVEKGKK G1-K8 899.40 2 1 K5, K7* ~11.3 Å — (SEQ ID NO: 11) KATNE K100-E104 K_(A)100 616.29 1 14.2 2.60E-08 (SEQ ID NO: 42) 0,2 GKK G6-K8 469.04 5 0 K7* ~18.7 Å — GGKHKTGPNLHGLFGR G23-R38 K_(A)25,  446.74 4 22.3 4.20E-06 (SEQ ID NO: 23) K_(A)27 0,2 GKKIFVQK G6-K13 519.28 3 2 K_(T)7,  KA8 544.30 2 23.1 1.90E-05 ~15.3 Å — (SEQ ID NO: 124) KK K87-K88 K87* 1,2 Ac-GDVEVGK G1-K7 828.40 3 0 K_(T)5 860.38 1 19.5 3.20E-05 ~13.8 Å — (SEQ ID NO: 32) MoxIFAGIKKKGER M80-R91 K_(A)86, K_(A)87, 794.41 2 36.3 2.00E-09 (SEQ ID NO: 123) K_(T)88 1,2 Ac-GDVEKGKKIFVQK G1-K13 799.06 3 2 K_(T)5, K_(T)7, 872.43 2 18.7 1.20E-04 ~12.1 Å — (SEQ ID NO: 126) K_(A)8 KATNE K100-E104 K_(A)100 616.30 1 14.2 2.40E-09 (SEQ ID NO: 42) 1,2 KYIPGTK K73-K79 839.10 3 1 K_(T)73 892.46 1 17.6 2.00E-05 ~15.3 Å — (SEQ ID NO: 53) MoxIFAGIKKKGER M80-R91 K_(A)86, K_(T)87, 794.41 2 36.3 2.00E-09 (SEQ ID NO: 323) K_(A)88 2,2 Ac-GDVEKGKK G1-K8 599.79 4 0 K5, K7* ~14.38  Å — (SEQ ID NO: 11) KKGER K87-R91 K_(A)87 336.20 2 14.8 1.50E-04 ~11.3  Å (SEQ ID NO: 13) KATNE K100-E104 K100* (SEQ ID NO: 42) *Peptide fragments containing these sites were not sequenced by MS3. **These intra-linked were identified by MS/MS. #These MS3 data were considered due to the presence of other lines of evidence for identifying the cross-linked peptides. ^(‡)They wete identified from different charged fragment pair ions by MS3. Note: Type 0: dead-end Type 1: intra-linked Type 0,1; 0,2; 1,2; 2,2: multi-linked All of the peptides displayed characteristic fragment pairs. All of the cross-linked peptides were identified by Link-Finder, Batch-tag and MS-Bridge.

TABLE 4  MS m/z Expect- Identified Peptide AA m/z Δ Mod. sequenced Peptide ation in other Type Sequence Location (Observed) z (PPM) Position in MS3 z Score Value Refs 0 MQIFVKTLTGK M1-K11 721.38 2  9 K_(T)6 676.36 2 30.1 5.40E-08 19, 38 (SEQ ID NO: 127) 0 AKIQDK A28-K33 439.72 2  7 K_(T)29 394.71 2 18.0 2.40E-04 — (SEQ ID NO: 128) 0 LIFAGKQLEDGR L43-R54 761.89 2 10 K_(T)48 716.87 2 35.1 1.10E-07 19, 38 (SEQ ID NO: 60) 0 LIFAGKQLEDGRTLSDYNIQK L43-K62 862.44 3  8 K_(T)48 832.43 3 34.1 1.20E-07 — (SEQ ID NO: 129) 0 TLSDYNIQKESTLHLVLR T55-R72 769.40 3 10 K_(A)63 728.73 3 36.1 1.40E-07 19, 38 (SEQ ID NO: 64) MS m/z Expect- Identified Peptide AA m/z Δ Mod. sequenced Peptide ation Distance In Other Type Sequence Location (Observed) z (PPM) Position in MS3 z Score Value (Cα-Cα) Refs 1 AKIQDKEGIPPDQQR A28-R42 940.97 2 5 K29, K33 940.97 28.5 4.40E-07 6.42 Å 19 (SEQ ID NO: 130) MS m/z Expect- Identified Peptide AA m/z Δ Mod. sequenced Peptide ation Distance In Other Type Sequence Location (Observed) z (PPM) Position in MS3 z Score Value (Cα-Cα) Refs 2 TLTGKTITLEVEPSDTIENVK T7-K27 993.01 4 5 K11* 13.3 Å 38 (SEQ ID NO: 57) IQDKEGIPPDQQR I30-R42 K_(A)33  789.41 2 28.6 3.20E-08 (SEQ ID NO: 58) 2 LIFAGKQLEDGR L43-R54 713.38 4 5 K_(A)48  700.88 2 39.2 1.00E-08 15.3 Å 19 (SEQ ID NO: 60) LIFAGKQLEDGR L43-R54 K_(A)48  716.87 2 36.4 1.90E-08 (SEQ ID NO: 60) 2 LIFAGKQLEDGR L43-R54 909.24 4 9 K_(A)48  700.89 2 35.5 1.80E-08 15.4 Å 19, 38 (SEQ ID NO: 60) TLSDYNIQKESTLHLVLR T55-R72 K_(T)63 1108.58 2 31.3 1.20E-08 (SEQ ID NO: 64)

TABLE 5 MS m/z Expect- Peptide AA m/z Δ Mod. sequenced Peptide ation Type Sequence Subunit Loctation (Observed z (PPM) Position in MS3 z Score Value 0 AKAEAAEFR α1(PRS2/SCL1) A97-R105 584.77 2 -1 K_(T)98 539.75 2 35.0 1.50E-04 (SEQ ID NO: 131) 0 VLVDKSR α2 (PRE8) V84-R90 496.76 2 0 K_(A)88 435.76 2 23.8 4.60E-04 (SEQ ID NO: 72) 0 TFLEKR α2 (PRE8) T173-R178 485.24 2 1 K_(A)177 424.24 2 22.9 3.30E-04 (SEQ ID NO: 132) 0 KVTSTLLEQDTSTEK α3 (PRE9) K51-65 928.45 2 0 K_(A)51 867.45 2 47.2 3.50E-09 (SEQ ID NO: 133) 0 STLKLQDTR α4 (PRE6) S50-R58 619.31 2 1 K_(A)53 558.31 2 36.3 3.90E-05 (SEQ ID NO: 134) 0 ITPSKVSK α4 (PRE6) I59-K66 518.27 2 1 K_(T)63 473.26 2 21.3 2.30E-03 (SEQ ID NO: 135) 0 ILIEKAR α4 (PRE6) I84-R90 509.78 2 -1 K_(T)88 464.77 2 27.4 1.40E-03 (SEQ ID NO: 136) 0 NSKTVR α4 (PRE6) N157-R172 440.71 2 1 K_(T)176 395.70 2 22.1 5.90E-03 (SEQ ID NO: 85) 0 EFLEKNYDR α4 (PRE6) E173-R181 695.30 2 -1 K_(T)177 650.29 2 30.9 1.40E-05 (SEQ ID NO: 83) 0 TAELIKELK α5 (PUP2) T236-K244 610.82 2 -4 K_(T)241 565.81 2 36.3 1.80E-04 (SEQ ID NO: 137) 0 KLAVER α6 (PRE5) K12-R107 446.23 2 2 K_(A)102 385.23 2 18.1 3.00E-04 (SEQ ID NO: 305) 0 LLVPQNKVK α7 (PRE10) L58-K66 607.84 2 1 K_(T)63 562.83 2 24.2 9.20E-05 (SEQ ID NO: 138) 0 AELEKLVDHHPEGLSAR α7 (PRE10) A174-R190 693.00 3 -2 K_(T)178 663.00 2 33.5 6.70E-06 (SEQ ID NO: 109) 0 EAVKQAAK α7 (PRE10) E191-K198 510.76 2 2 K_(T)194 465.74 2 26.9 7.10E-04 (SEQ ID NO: 139) 0 YKTNLYK β3 (PUP3) Y69-K75 553.27 2 3 K_(T)70 508.25 2 25.7 9.60E-05 (SEQ ID NO: 82) 0 TNLYKLK β3 (PUP3) T71-K77 528.27 2 -5 K_(A)75 467.27 2 25.9 2.50E-03 (SEQ ID NO: 140) 0 QELAKSIR β4 (PRE1) Q86-R93 560.79 2 2 K_(A)90 499.79 2 22.5 4.00E-03 (SEQ ID NO: 141) 0 IVDKDGIR β4 (PRE1) I183-R190 546.27 2 1 K_(T)186 501.26 2 30.6 1.40E-03 (SEQ ID NO: 142) 0 FKNSVK β6 (PRE7)^(†) F59-K64 449.72 2 1 K_(T)60 404.71 2 19.0 1.90E-02 (SEQ ID NO: 103) 0 KLSINSAAR β6 (PRE7)^(†) K74-R82 568.29 2 3 K_(A)74 507.29 2 32.8 2.00E-04 (SEQ ID NO: 143) 0 KEFYELK β6 (PRE7)^(†) K205-K211 566.77 2 2 K_(A)205 505.77 2 24.7 5.40E-03 (SEQ ID NO: 144) MS m/z Expect- Peptide AA m/z Δ Mod. sequenced Peptide ation Distance Type Sequence Subunit Location (Observed) z (PPM) Position in MS3 z Score Value (Cα-Cα) 2 ATATGPKQQEITTNLENHFK α1(PRS2/SCL1) A168-K187 595.10 5 2 KA174  571.29 4 24.5 2.90E-04 14.8  Å (SEQ ID NO: 66) KVPDK α1(PRS2/SCL1) K58-K62 KT58  672.34 1 12.3 0.71** (SEQ ID NO: 68) 2 KVAHTSYK α2 (PRE8) K91-K98 477.51 4 2 KT91  510.25 2 29.9 7.60E-05  5.1  Å (SEQ ID NO: 70) VLVLDKSR α2 (PRE8) V84-R90 KA88  435.76 2 27.6 2.50E-03 (SEQ ID NO: 72) KVAHTSYK α2 (PRE8) K91-K98 382.21 5 1 KA91  329.85 3 19.3 1.90E-02 (SEQ ID NO: 70) VLVDKSR α2 (PRE8) V84-R90 KT88  451.74 2 25.4 2.50E-04 (SEQ ID NO: 72) 2 IFKPQEIK α3 (PRE9) I229-K236 514.03 4 0 KT231  544.80 2 23.6 1.50E-02 14.2  Å (SEQ ID NO: 74) LYKLNDK α3 (PRE9 L66-K72 KA68  474.26 2 25.5 5.50E-03 (SEQ ID NO: 86) 2 IHAQNYLKTYNEDIPVEILVR α3 (PRE9) I93-R113 904.47 4 1 KT100 1307.68 2 26.6 7.90E-05 10.6  Å (SEQ ID NO: 78) YKTNLYK β3 (PUP3)  Y69-K75 KA70  492.27 2 23.9 3.00E-03 (SEQ ID NO: 80) 2 IHAQNYLKTYNEDIPVEILVR α3 (PRE9) I93-R113 723.78 5 5 K100* (SEQ ID NO: 78) YKTNLYK β3 (PUP3)  Y69-K75 KA70  492.27 2 24.2 2.90E-03 (SEQ ID NO: 80) 2 EFLEKNYDR α4 (PRE6) E173-R181 692.33 3 2 KA177*  634.30 2 23.6 2.60E-04 13.1  Å (SEQ ID NO: 83) NSKTVR α4 (PRE6) N167-R172 KA169  379.71 2 22.6 2.80E-03 (SEQ ID NO: 85) EFLEKNYDR α4 (PRE6) E173-R181 519.50 4 2 KT177  650.29 2 33.2 1.70E-05 (SEQ ID NO: 83) NSKTVR α4 (PRE6) N167-R172 KA169  379.71 2 22.6 2.80E-03 (SEQ ID NO: 85) 2 ILKQVMEEK α5 (PUP2) I203-K211 641.01 3 0 KT205  602.31 2 29.2 3.50E-03 10.5 Å (SEQ ID NO: 87) ELKEK α5 (PUP2) E242-K246 K244* (SEQ ID NO: 89) ILKQVMEEK α5 (PUP2) I203-K211 481.01 4 0 KT205  602.31 2 27.6 2.60E-04 (SEQ ID NO: 87) ELKEK α5 (PUP2) E242-K246 K244* (SEQ ID NO: 89) 2 SYKFPR β2 (PUP1)^(†) S202-R207 539.26 3 1 KA204  426.23 2 23.1 6.40E-03 12.1  Å (SEQ ID NO: 90) EEKQK β2 (PUP1)^(†) E197-K201 KT199  747.34 1 10.4 0.3** (SEQ ID NO: 92) SYKFPR β2 (PUP1)^(†) S202-R207 404.70 4 2 KT204  442.21 2 21.1 8.20E-04 (SEQ ID NO: 90) EEKQK β2 (PUP1)^(†) E197-K201 K199* (SEQ ID NO: 92) 2 YKTNLYK β3 (PUP3) Y69-K75 587.64 3 2 KA70^(†)  492.26 2 23.8 4.60E-04 10.7  Å (SEQ ID NO: 30) LKEER β3 (PUP3) L76-R80 KA77^(†)  364.70 2 17.0 2.70E-02 (SEQ ID NO: 94) YKTNLYK β3 (PUP3) Y69-K75 440.98 4 2 KT70  508.25 2 25.7 1.10E-04 (SEQ ID NO: 30) LKEER β3 (PUP3) L76-R80 KA77  364.70 2 16.5 8.40E-03 (SEQ ID NO: 94) 2 LGSQSLGVSNKFEK β3 (PUP3) L29-K42 793.07 3 2 KA39  774.41 2 42.0 5.30E-07 13.2  Å (SEQ ID NO: 29) YLKMoxR β3 (PUP3) Y199-R203 KT201  406.69 2 16.2 1.10E-03 (SEQ ID NO: 97) LGSQSLGVSNKFEK β3 (PUP3) L29-K42 595.05 4 2 KT39  790.40 2 40.7 8.40E-07 (SEQ ID NO: 29) YLKMoxR β3 (PUP3) Y199-R203 KA201  390.71 2 18.1 6.10E-03 (SEQ ID NO: 97) 2 NKPWLYQIDYLGTK β4 (PRE1) N112-K125 833.92 4 0 KA113  868.45 2 32.0 9.50E-08 19.1  Å (SEQ ID NO: 27) LGSQSLGVSNKFEK β3 (PUP3) L29-K42 KT39  790.39 2 26.5 3.90E-05 (SEQ ID NO: 29) 2 VQDSVILASSKAVTR β4 (PRE1) V9-R23 633.74 5 1 KA19  543.30 3 23.0 4.90E-03  7.8  Å (SEQ ID NO: 99) GISVLKDSDDKTR β4 (PRE1) G24-R36 KT29  760.38 2 35.4 2.40E-05 (SEQ ID NO: 101) 2 FKNSVK β6 (PRE7)^(†) F59-K64 532.29 3 2 KT60  808.40 1 16.2 2.00E-02 16.2  Å (SEQ ID NO: 103) KLAVER α6 (PRE5) K102-R107 KA102  385.23 2 21.2 9.80E-04 (SEQ ID NO: 105) 4 2 FKNSVK β6 (PRE7)^(†) F59-K64 399.47 KT60  404.71 2 16.5 1.10E-02 (SEQ ID NO: 103) KLAVER α6 (PRE5) K102-R107 KA102  385.23 2 18.3 1.60E-04 (SEQ ID NO: 105) 2 NQYEPGTNGKVK β6 (PRE7)^(†) N149-K160 659.68 3 0 KA158  694.84 2 29.8 4.20E-05  9.8 Å (SEQ ID NO: 106) KPLK β6 (PRE7)^(†) K161-K164 K161* (SEQ ID NO: 108) NQYEPGTNGKVK β6 (PRE7)^(†) N149-K160 495.01 4 2 KT158  710.83 26.3 3.00E-04 (SEQ ID NO: 106) KPLK β6 (PRE7)^(†) K161-K164 K161* (SEQ ID NO: 108) *Peptide fragment containing theses sites were not sequenced by MS3. **The peptide identification was above 1% false positive rat but MS3 was validated manually. ^(‡)They were identified from different fragment pair ions by MS3. ^(†)Mature sequence from crystal data was used for data analysis. Note: Type 0: dead-end All of the peptides displayed characteristic fragment pairs. All of the cross-linked peptides were identified by Link-Finder, Batch-tag, MS-Bridge.

TABLE 6 SEQ ID NO: Sequence SEQ ID NO: 1 IEAEKGR SEQ ID NO: 2 Ac-IEAEKGR SEQ ID NO: 3 Ac-IEAEK_(A)GR SEQ ID NO: 4 Ac-IEAEK_(S)GR SEQ ID NO: 5 ASQKRPSQRHG SEQ ID NO: 6 Ac-ASQKRPSQRHG SEQ ID NO: 7 RPKPQQF SEQ ID NO: 8 RPK_(A)PQQF SEQ ID NO: 9 RPK_(DN)PQQF SEQ ID NO: 10 GDVEKGKK SEQ ID NO: 11 Ac-GDVEKGKK SEQ ID NO: 12 Ac-GDVEK_(A)GKK SEQ ID NO: 13 KKGER SEQ ID NO: 14 K_(T)KGER SEQ ID NO: 15 K_(A)KGER SEQ ID NO: 16 HKTGPNLHGLFGR SEQ ID NO: 17 HK_(A)TGPNLHGLFGR SEQ ID NO: 18 HK_(T)TGPNLHGLFGR SEQ ID NO: 19 KTGQAPGFSYTDANK SEQ ID NO: 20 K_(DN)TGQAPGFSYTDANK SEQ ID NO: 21 K_(A)TGQAPGFSYTDANK SEQ ID NO: 22 K_(T)TGQAPGFSYTDANK SEQ ID NO: 23 GGKHKTGPNLHGLFGR SEQ ID NO: 24 GGK*HK*TGPNLHGLFGR SEQ ID NO: 25 GGK_(A)HK_(T)TGPNLHGLFGR SEQ ID NO: 26 GGK_(T)HK_(A)TGPNLHGLFGR SEQ ID NO: 27 NKPELYQIDYLGTK SEQ ID NO: 28 NK_(A)PELYQIDYLGTK SEQ ID NO: 29 LGSQSLGVSNKFEK SEQ ID NO: 30 LGSQSLGVSNK_(T)FEK SEQ ID NO: 31 GDVEKGK SEQ ID NO: 32 Ac-GDVEKGK SEQ ID NO: 33 Ac-GDVEK_(T)GK SEQ ID NO: 34 Ac-GDVEK_(A)GK SEQ ID NO: 35 KIFVQK SEQ ID NO: 36 K_(A)IFVQK SEQ ID NO: 37 K_(T)IFVQK SEQ ID NO: 38 KGER SEQ ID NO: 39 EDLIAYLKK SEQ ID NO: 40 EDLIAYLK_(A)K SEQ ID NO: 41 EDLIAYLK_(T)K SEQ ID NO: 42 KATNE SEQ ID NO: 43 K_(A)ATNE SEQ ID NO: 44 GGKHK SEQ ID NO: 45 GGK_(T)HK SEQ ID NO: 46 TGQAPGFSYTDANKNK SEQ ID NO: 47 TGQAPGFSYTDANK_(T)NK SEQ ID NO: 48 YIPGTKMIFAGIK SEQ ID NO: 49 YIPGTKM_(OX)IFAGIK SEQ ID NO: 50 YIPGTK_(A)M_(OX)IFAGIK SEQ ID NO: 51 KYIPGTK SEQ ID NO: 52 K_(T)YIPGTK SEQ ID NO: 53 MIFAGIKK SEQ ID NO: 54 M_(OX)IFAGIKK SEQ ID NO: 55 M_(OX)IFAGIK_(T)K SEQ ID NO: 56 MIFAGIK_(T)K SEQ ID NO: 57 TLTGKTITLEVEPSDTIENVK SEQ ID NO: 58 IQDKEGIPPDQQR SEQ ID NO: 59 IQDK_(A)EGIPPDQQR SEQ ID NO: 60 LIFAGKQLEDGR SEQ ID NO: 61 LIFAGK_(A)QLEDGR SEQ ID NO: 62 LIFAGK_(T)QLEDGR SEQ ID NO: 63 LIFAGK⁴⁸QLEDGR SEQ ID NO: 64 TLSDYNIQKESTLHLVLR SEQ ID NO: 65 TLSDYNIQK_(T)ESTLHLVLR SEQ ID NO: 66 ATATGPKQQEITTNLENHFK SEQ ID NO: 67 ATATGPK_(A)QQEITTNLENHFK SEQ ID NO: 68 KVPDK SEQ ID NO: 69 K_(T)VPDK SEQ ID NO: 70 KVAHTSYK SEQ ID NO: 71 K_(T)VAHTSYK SEQ ID NO: 72 VLVDKSR SEQ ID NO: 73 VLVDK_(A)SR SEQ ID NO: 74 IFKPQEIK SEQ ID NO: 75 IFK_(T)PQEIK SEQ ID NO: 76 LYKLNDK SEQ ID NO: 77 LYK_(A)LNDK SEQ ID NO: 78 IHAQNYLKTYNEDIPVEILVR SEQ ID NO: 79 IHAQNYLK_(T)TYNEDIPVEILVR SEQ ID NO: 80 YKTNLYK SEQ ID NO: 81 YK_(A)TNLYK SEQ ID NO: 82 YK_(T)TNLYK SEQ ID NO: 83 EFLEKNYDR SEQ ID NO: 84 EFLEK_(A)NYDR SEQ ID NO: 85 NSKTVR SEQ ID NO: 86 NSK_(Ak)TVR SEQ ID NO: 87 ILKQVMEEK SEQ ID NO: 88 ILK_(T)QVMEEK SEQ ID NO: 89 ELKEK SEQ ID NO: 90 SYKFPR SEQ ID NO: 91 SYK_(A)FPR SEQ ID NO: 92 EEKQK SEQ ID NO: 93 EEK_(T)QK SEQ ID NO: 94 LKEER SEQ ID NO: 95 LK_(A)EER SEQ ID NO: 96 YLKMR SEQ ID NO: 97 YLKM_(OX)R SEQ ID NO: 98 YLK_(A)M_(OX)R SEQ ID NO: 99 VQDSVILASSKAVTR SEQ ID NO: 100 VQDSVILASSK_(Ak)AVTR SEQ ID NO: 101 GISVLKDSDDKTR SEQ ID NO: 102 GISVLK_(T)DSDDKTR SEQ ID NO: 103 FKNSVK SEQ ID NO: 104 FK_(A)NSVK SEQ ID NO: 105 KLAVER SEQ ID NO: 106 NQYEPGTNGKVK SEQ ID NO: 107 NQYEPGTNGK_(A)VK SEQ ID NO: 108 KPLK SEQ ID NO: 109 AELEKLVDHHPEGLSAR SEQ ID NO: 110 AELEK_(DN)LVDHHPEGLSAR SEQ ID NO: 111 AELEK_(A)LVDHHPEGLSAR SEQ ID NO: 112 AELEK_(T)LVDHHPEGLSAR SEQ ID NO: 113 KYIPGTKMIFAGIK SEQ ID NO: 114 KYIPGTKMoxIFAGIK SEQ ID NO: 115 KYIPGTKMIFAGIKK SEQ ID NO: 116 KYIPGTKMoxIFAGIKK SEQ ID NO: 117 MIFAGIKKK SEQ ID NO: 118 MoxIFAGIKKK SEQ ID NO: 119 EDLIAYLKKATNE SEQ ID NO: 120 YIPGTKMIFAGIKKK SEQ ID NO: 121 YIPGTKMoxIFAGIKKK SEQ ID NO: 122 MIFAGIKKKGER SEQ ID NO: 123 MoxIFAGIKKKGER SEQ ID NO: 124 GKKIFVQK SEQ ID NO: 125 GDVEKGKKIFVQK SEQ ID NO: 126 Ac-GDVEKGKKIFVQK SEQ ID NO: 127 MQIFVKTLTGK SEQ ID NO: 128 AKIQDK SEQ ID NO: 129 LIFAGKQLEDGRTLSDYNIQK SEQ ID NO: 130 AKIQDKEGIPPDQQR SEQ ID NO: 131 AKAEAAEFR SEQ ID NO: 132 TFLEKR SEQ ID NO: 133 KVTSTLLEQDTSTEK SEQ ID NO: 134 STLKLQDTR SEQ ID NO: 135 ITPSKVSK SEQ ID NO: 136 ILIEKAR SEQ ID NO: 137 TAELIKELK SEQ ID NO: 138 LLVPQKNVK SEQ ID NO: 139 EAVKQAAK SEQ ID NO: 140 TNLYKLK SEQ ID NO: 141 QELAKSIR SEQ ID NO: 142 IVDKDGIR SEQ ID NO: 143 KLSINSAAR SEQ ID NO: 144 KEFYELK Ac-Acetyl Xaa_(A)-Alkene modification Xaa_(Ak)-Alkane modification Xaa_(DN)-Dead-end modification Xaa_(T)-Thiol modification Xaa_(S)-Sulfenic acid modification Xaa_(OX)-Oxidation *-Intra-peptide linkage Xaa⁴⁸-Inter-peptide linkage

DETAILED DESCRIPTION

In a first aspect of the invention, a new crosslinking compound is provided and has the formula:

where x is selected from the group consisting of

wherein R is methyl or ethyl, and

A particularly preferred cross-linking agent is bis(2,5-dioxopyrrolidin-1-yl) 3,3′-sulfinyldipropanoate (“DSSO”):

In a second aspect of the invention, the new cross-linking agents are used to facilitate mapping of protein-protein interactions of protein complexes. In one embodiment, the method comprises the steps of providing a MS-cleavable cross-linker having the formula described above; forming a cross-linked protein complex by cross-linking proteins with the MS-cleavable cross-linker; forming cross-linked peptide fragments that are chemically bound to the MS-cleavable cross-linker by digesting the cross-linked protein complex with an enzyme such as trypsin; and using mass spectrometry (MS) and MS^(n) analysis to identify the protein and/or peptide fragments. For convenience, in the discussion that follows, reference is sometimes made to the particular crosslinker, DSSO. It will be understood, however, that any of the other MS-cleavable crosslinkers that fit the general formula may also be used. Thus, DSSO fragments, DSSO remnants, DSSO cross-linked peptides, and like language applies equally to other crosslinkers as described herein.

Abbreviations

MS: mass spectrometry MS/MS: tandem mass spectrometry MS^(n): multi-stage tandem mass spectrometry (n=2, 3, . . . ) LC MS^(n): liquid chromatography multi-stage tandem mass spectrometry CID: collision induced dissociation DSSO: bis(2,5-dioxopyrrolidin-1-yl) 3,3′-sulfinyldipropanoate NMR: nuclear magnetic resonance

The CID-induced separation of inter-linked peptides in MS/MS permits MS³ analysis of single peptide chain fragment ions with defined modifications (due to diamide remnants) for easy interpretation and unambiguous identification using existing database searching tools. Integration of data analyses from three generated datasets (MS, MS/MS and MS³) allows high confidence identification of DSSO cross-linked peptides. The efficacy of the newly developed DSSO-based cross-linking strategy has been demonstrated using model peptides and proteins. In addition, this method has been successfully employed for structural characterization of the yeast 20 S proteasome complex. In total, 13 non-redundant inter-linked peptides of the 20 S proteasome have been identified, representing the first application of an MS-cleavable cross-linker for the characterization of a multi-subunit protein complex. Given its effectiveness and simplicity, this cross-linking strategy can find a broad range of applications in elucidating structural topology of proteins and protein complexes.

In combination with new software developed for data integration, the inventors were able to identify DSSO cross-linked peptides from complex peptide mixtures with speed and accuracy. Given its effectiveness and simplicity, the inventors anticipate a broader application of this MS-cleavable cross-linker in the study of structural topology of other protein complexes using cross-linking and mass spectrometry.

Experimental Procedures

Materials and Reagents—

General chemicals were purchased from Fisher Scientific (Hampton, N.H.) or VWR International (West Chester, Pa.). Bovine heart cytochrome c (98% purity) and bovine erythrocyte ubiquitin (98% purity) were purchased from Sigma Aldrich (St. Louis, Mo.). Synthetic peptide Ac-IR7 (Ac-IEAEKGR (SEQ ID NO: 2), 98.1% purity) was synthesized by GL Biochem (Shanghai, China). Sequencing grade modified trypsin was purchased from Promega (Fitchburg, Wis.). The 20 S proteasome core particle was affinity purified using Prel-TAP expressing yeast strain as previously described in Leggett, D. S., Hanna, J., Borodovsky, A., Crosas, B., Schmidt, M., Baker, R. T., Walz, T., Ploegh, H., and Finley, D. (20032) Multiple Associated Proteins Regulate Proteasome Structure and Function. Mol Cell. 10, 495-507.

Synthesis and Characterization of DSSO—

FIG. 2A displays a two-step synthesis scheme of DSSO with an extended spacer length of 10.1 Å. Sulfide S-1 was first synthesized by mixing 3,3′-thiodipropionic acid (2.50 g, 14.0 mmol) with N-hydroxysuccinimide (3.30 g, 28.6 mmol) in dioxane (60 ml). The reaction mixture was stirred under an atmosphere of argon, and a solution of DCC (5.79 g, 28.1 mmol) in dioxane (20 ml) was added drop-wise. After 12 h, the insoluble urea was filtered from the reaction. The filtrate was concentrated to form a white solid. The solid residue was washed with cold diethyl ether followed by cold hexanes. After drying under reduced pressure, 5.20 g (70%) of sulfide S-1 was recovered and used without further purification: 1H (500 MHz, DMSO-d6) δ 3.02 (t, J=7.0 Hz, 4H), 2.86 (t, J=7.0 Hz, 4H), 2.81 (s, 8H); ¹³C (125 MHz, DMSO-d6) δ 170.1, 167.8, 31.4, 25.6, 25.4; IR (KBr pellet) 1801, 1732 cm⁻¹; HRMS (ES/MeOH) m/z calcd for C₁₄H₁₆N₂O₈SNa [M+Na]⁺ 395.0525. found 395.0531.

To synthesize DSSO, a solution of sulfide S-1 (0.600 g, 1.61 mmol) in CHCl₃ (30 ml) at 0° C. was mixed with a solution of m-chloroperbenzoic acid (MCPBA) (0.371 g, 1.61 mmol) in CHCl₃ (10 ml). The reaction product was filtered and washed with cold CHCl₃ (10 ml) and cold MeOH (10 ml). The filtrate was cooled to −10° C. for 1 h, washed again with CHCl₃ and MeOH, and dried under reduced pressure to yield 0.400 g (64%) of DSSO: 1H (600 MHz, DMSO-d6) δ 3.28-3.21 (m, 2H), 3.17-3.13 (m, 4H), 3.08-2.99 (m, 2H), 2.88-2.75 (s, 8H); ¹³C (125 MHz, DMSO-d6) δ 170.08, 167.74, 44.62, 25.46, 23.41; IR (KBr pellet) 2943, 1786, 1720 cm⁻¹; HRMS (ES/MeOH) m/z calculated for C₁₄H₁₆N₂O₉Na [M+Na]⁺ 411.0474. found 411.0471.

A similar synthetic approach is used to make the other symmetric diesters identified above and having the general structure 2, where X is as defined above. Thus, the symmetric sulfide is prepared by reacting 3,3′-thiodipropionic acid with the appropriate N-hydroxyamine (e.g., a functionalized analogue of N-hydroxysucinimide (compounds 4-7), or other N-hydroxy-functionalized heterocycle (compounds 3, 8, and 9), and then the sulfinyl group is made by treating the symmetric sulfide with MCPBA in CHCl₃ or another appropriate solvent.

Cross-Linking of Synthetic Peptides with DSSO—

Synthetic peptides Ac-IR7, Ac-myelin and substance P were dissolved in DMSO to 1 mM and cross-linked with DSSO dissolved in DMSO in a ratio of 1:1 in the presence of 1 equivalent diisopropylethylamine similarly as described Vellucci, D, et al. The cross-linked peptide solution was then diluted to 1 pmol/μl in 4% ACN, 0.1% formic acid for liquid chromatography multi-stage tandem mass spectrometry (LC MS^(n)) analysis.

Cross-Linking of Cytochrome C and Ubiquitin with DSSO—

Lyophilized bovine cytochrome c or ubiquitin was reconstituted in 1×PBS (pH 7.5) to 200 μM, 20 μl of which was mixed with 2 μl 20 mM DSSO (in DMSO) in a molar ratio of 1:10 (protein: cross-linker) for the cross-linking reaction as described in Vellucci, D., et al. The cross-linked protein was digested with trypsin (1% w/w) overnight at 37° C. The cross-linked peptide digest was then diluted to 1 pmol/μl in 4% ACN, 0.1% formic acid for LC MS^(n) analysis.

Cross-Linking of the Yeast 20 S Proteasome with DSSO—

Affinity purified yeast 20S proteasome complex was concentrated by Microcon (Billerica, Mass.) to −1.2 μM in 1×PBS buffer (pH 7.5). Typically 50 μl of the 20S proteasome was cross-linked with 3 μl DSSO (20 mM) dissolved in DMSO (final concentration ˜1 mM) at a molar ratio of 1:1000 (protein:cross-linker). Cross-linking was performed for a half hour or overnight and quenched with excess ammonium bicarbonate buffer. Cysteine residues were reduced with 5 mM DTT at 56° C. for 30 mins, and alkylated with 10 mM choloroacetamide for 30 min at room temperature. The cross-linked protein complex was digested with trypsin (2% w/w) overnight at 37° C. Digested peptides were desalted by C18 OMIX ZipTip (Varian, Palo Alto, Calif.) prior to LC MS^(n) analysis.

For some analyses, 2-dimensional LC MS^(n) analysis was carried out. Off-line strong cation exchange (SCX) chromatography was performed as the first dimension of separation using an ÄKTA HPLC system (GE Healthcare Life Sciences, Uppsala, Sweden) as described in Kaake, R. M., et al. Each fraction was desalted by ZipTip prior to LC MS' analysis.

LC MS^(n) Analysis—

LC MS^(n) analysis of DSSO cross-linked peptides was performed using a LTQ-Orbitrap XL MS (Thermo Scientific, San Jose, Calif.) with an on-line Eksigent NanoLC system (Eksigent, Dublin, Calif.). The LC separation was the same as previously described by Vellucci, D., et al. The MS^(n) method was set specifically for analyzing DSSO cross-linked peptides. Each acquisition cycle of a MS^(n) experiment includes one MS scan in FT mode (350-1800 m/z, resolution of 60,000 at m/z 400) followed by two data-dependent MS/MS scans with normalized collision energy at 10 or 15% on the top two peaks from the MS scan, and then three MS³ scans operated in LTQ with normalized collision energy at 29% on the top three peaks from each of the MS/MS scans. For initial analyses, MS/MS spectra were acquired in LTQ in LC MS^(n) experiments. For automated data analysis, MS/MS spectra were obtained in FT mode (resolution of 7500).

Data Analysis of DSSO Cross-Linked Peptides—

Monoisotopic masses of parent ions and corresponding fragment ions, parent ion charge states and ion intensities from LC MS/MS and LC MS³ spectra were extracted using in-house software based on Raw_Extract script from Xcalibur v2.4 (Thermo Scientific, San Jose, Calif.). Database searching was performed with a developmental version of Protein Prospector (v. 5.5.0, University of California, San Francisco) (http://prospector.ucsf.edu/prospector/mshome.htm) using its software suite, i.e. Batch-Tag and MS-Bridge as described in Chu, F., et al. Using in-house scripts, extracted MS³ data were reformatted such that MS³ fragment ions were directly linked to their MS/MS parent ions. For cytochrome c (P62894) and ubiquitin (P62990) analyses, database searching of MS³ spectra was performed using Batch-Tag against their accession numbers in SwissProt. 2009.09.01 database. For the 20S proteasome, Batch-Tag search of MS³ data was performed against a decoy database consisting of a normal SGD yeast database concatenated with its reversed version (total 13490 protein entries). The mass tolerances for parent ions and fragment ions were set as ±20 ppm and 0.6 Da, respectively. Trypsin was set as the enzyme and a maximum of two missed cleavages were allowed. Protein N-terminal acetylation, methionine oxidation, and N-terminal conversion of glutamine to pyroglutamic acid were selected as variable modifications. In addition, three defined modifications on uncleaved lysines were chosen, including alkene (C₃H₂O, +54 Da), sulfenic acid (C₃H₄O₂S, +104 Da), and thiol (C₃H₂SO, +86 Da) modifications due to remnants of the cross-linker (FIG. 1). Initial acceptance criteria for peptide identification required a reported expectation value ≦0.05. For the 20S proteasome analysis, the false positive rate for peptide identification is less than 1%.

The Link-Finder program (http://www.ics.uci.edu/˜baldig/Link-Finder/) was developed to search MS/MS data and identify the list of putative DSSO inter-linked and dead-end products based on their unique MS fragmentation patterns as illustrated in FIG. 2 (details see results section). For example, one embodiment of the invention includes identifying the MS/MS data that display characteristic fragmentation profiles of DSSO cross-linked peptides based on the unique mass relationships between parent ions of cross-linked peptides and their fragment ions to obtain an MS/MS result including a list of parent ions corresponding to cross-linked peptide candidates (e.g., the putative or potential identities of the cross-linked peptides being analyzed). In one embodiment, analysis of the MS/MS data is carried out using the Link-Finder program. Monoisotopic masses and charges of parent ions measured in MS scans for those putative cross-linked peptides identified by the Link-Finder program were subsequently submitted to MS-Bridge to determine cross-linked peptide sequences by mass mapping with a given cross-linker (i.e. DSSO) and protein sequences (see Chu, F., et al.). For example, one embodiment of the invention further includes mass mapping the MS data using the list of parent ions corresponding to the cross-linked peptide candidates and the MS-cleavable cross-linker againt known protein sequences to obtain an MS result comprising possible cross-linked peptide sequences. In one embodiment, the mass mapping is carried out using MS-Bridge. The parent mass error for MS-Bridge search was set as ±10 ppm and only one cross-link was allowed in the cross-linked peptides for general search. All of the three types of the cross-linked peptides (Schilling, B., et al.), i.e. inter-linked (type 2), intra-linked (type 1) and dead-end modified (type 0), can be computed and matched in MS-Bridge (see Chu, F., et al.).

The search results from Link-Finder, Batch-Tag and MS-Bridge programs are integrated together using in-house scripts to compile a list of cross-linked peptides identified with high confidence. The final results were validated manually by examining MS/MS spectra and MS³ spectra respectively.

Results

Development of a Novel Sulfoxide Containing MS-Cleavable Cross-Linker—

In order to develop a robust MS-cleavable cross-linking reagent, the incorporated MS-labile bond must have the ability to selectively and preferentially fragment prior to peptide backbone breakage independent of peptide charges and sequences. It is well documented that methionine sulfoxide containing peptides have preferential fragmentation at the C—S bond adjacent to the sulfoxide during collision induced dissociation (CID) analysis (see Reid, G. E., Roberts, K. D., Kapp, E. A., and Simpson, R. I. (2004) Statistical and Mechanistic Approaches to Understanding the Gas-Phase Fragmentation Behavior of Methionine Sulfoxide Containing Peptides. J Proteome Res 3, 751-759), and this fragmentation is dominant and much more labile than peptide bonds. Such labile fragmentation has often been observed as the loss of 64 Da (—SOCH₄) from oxidized methionine containing peptides in our routine peptide analysis. Therefore, the inventors expect that if a sulfoxide is incorporated in the spacer region of a NHS ester, the C—S bond adjacent to the sulfoxide will be MS-labile and prone to preferential fragmentation. To test this, the inventors have designed and synthesized a CID cleavable cross-linker having a general formula of 3,3′-sulfinylbispropanoic acid, also known as 3,3-′sulfonyldipropanoic acid. The molecular formula is C₆H₁₀O₅S, and it has a general structure as shown in General Structure 2 of FIG. 1 where X=—OH. The molecular formula is C₆H₁₀O₅S, and it has a general structure as shown in General Structure 2 of FIG. 1 where X=—OH. More specific cleaving agents are as shown in FIG. 1 including Compound 1, namely Disuccinimidyl Sulfoxide (sometimes hereinafter referred to as “DSSO”), which is one exemplary compound of the invention. Other compounds where the X in the General Structure 2 are substituted are shown as Compounds 3-6 in FIG. 1. Hereinafter, while reference is made to DSSO, other MS-cleavable cross-linker having the general structure as shown in General Structure 2 of FIG. 1 are included as MS-cleavable cross-linkers of invention. Turning back to disuccinimidyl sulfoxide (DSSO), it contains two NHS ester functional groups and two symmetric MS-labile C—S bonds adjacent to the sulfoxide (FIG. 2A). DSSO has a spacer length of 10.1 Å, making it well suited for detecting protein interaction interfaces of protein complexes and generating highly informative distance constraints. In comparison to existing MS-cleavable cross-linkers, DSSO can be easily synthesized in a two-step process as shown in FIG. 2A.

Proposed CID Fragmentation Pattern of DSSO Cross-Linked Peptides—

Three types of cross-linked peptides can be formed during the cross-linking reaction: inter-linked (type 2), intra-linked (type 1) and dead-end (type 0) modified peptides (Schilling, B., et al.), among which inter-linked peptides are the most informative for generating distance constraints. FIGS. 2B-D shows the proposed fragmentation schemes of DSSO cross-linked peptides. As shown in FIG. 2B, during CID analysis of a DSSO inter-linked peptide α-β, the cleavage of one C—S bond next to the sulfoxide separates the inter-linked peptide into a pair of peptide fragments, i.e. α_(A)/β_(S), in which the a peptide fragment is modified with the alkene (A) moiety (+54 Da) and the β peptide fragment is modified with the sulfenic acid (S) moiety (+104 Da). If peptides a and β have different sequences, two possible pairs of fragments (i.e. α_(A)/β_(S) and α_(S)/β_(A)) will be observed due to the breakage of either of the two symmetric C—S bonds next to the sulfoxide in the spacer region of DSSO (FIG. 2B), thus resulting in four individual peaks in the MS/MS spectrum. But if peptides a and β have the same sequences, only one fragment pair, i.e. two peaks, will be detected in the MS/MS spectrum. To determine sequences of inter-linked peptides and assign the cross-linking site, the resulting peptide fragments (i.e. α_(A), β_(S), α_(S), or β_(A)) generated in MS/MS can be further subjected to LTQ-Orbitrap XL MS for MS³ analysis. Because these fragments represent single peptide sequences, the interpretation of the MS³ spectra by Batch-Tag program in Protein Prospector is identical to the identification of a single peptide with a defined modification (remnant of the cross-linker). This will dramatically simplify data interpretation and improve the identification accuracy of cross-linked products.

DSSO dead-end modified peptides have a defined mass modification (+176 Da) due to the half-hydrolyzed DSSO (FIG. 2C). MS/MS analysis of a dead-end modified peptide α_(DN) would result in two possible fragment ions, i.e. α_(A) and α_(S), due to the cleavage of the C—S bond on either side of the sulfoxide. The inventors name the α_(A) and α_(S) fragments as the dead end fragment pair and the mass difference between these fragments correlates to the difference between the remnants of DSSO attached to the fragments. Similarly, intra-linked peptides (e.g. α_(intra)) also have a defined mass modification (+158 Da) due to DSSO cross-linking of two distinct lysines in the same peptide sequence (FIG. 2D). The cleavage of the C—S bond will result in only one fragment peak in MS/MS with the same mass as the parent ion observed in MS. MS³ analysis of fragment ions detected in MS/MS will lead to the detection of y or b ions containing either alkene (A) or sulfenic acid (S) modifications.

As shown in FIG. 2E, the sulfenic acid containing fragment (e.g. α_(S), β_(S), or α_(A+S)) may undergo further fragmentation and lose a water molecule (−18 Da) to generate a new fragment containing an unsaturated thiol (T) moiety (+86 Da) (e.g. α_(T), β_(T), or α_(A+T)). The inventors do not expect any complication with data analysis as the thiol-containing fragment ion will become the dominant ion instead of the sulfenic acid modified fragment ion in the MS/MS spectrum. Thus the inventors anticipate that the total number of pairs and peaks will remain similar as shown in FIGS. 2B-D. Due to specific and unique MS/MS fragmentation patterns for different types of DSSO cross-linked peptides, there are fixed mass relationships between parent ions and their fragment ions as listed in FIG. 2F. For DSSO inter-linked peptides (α-β), the mass sum of each fragment pair (α_(A)/β_(S) or α_(S)/β_(A)) is equivalent to the mass of the parent ion (FIG. 2F, Eq. 1). If α_(S) or β_(S) loses a water and becomes α_(T) or β_(T) respectively, the fragment pairs will be α_(A)/β_(T) and α_(T)/β_(A) and the mass sum of each fragment pair plus a water will be the same as the parent mass (FIG. 2F, Eq. 2). As for the dead-end (DN) modified peptide α_(DN), each fragment (i.e. α_(A), α_(S) or α_(T)) has a distinct mass difference from the parent ion (FIG. 2F, Eq. 3). For the intra-link peptide αintra, the fragment mass could be either the same as the parent mass (i.e. α_(A+S)), or 18 Da less than the parent mass (i.e. α_(A+T)) (FIG. 2F, Eq. 4). Moreover, there is a definite mass difference (Δ 32 Da) between the thiol (T) and alkene (A) modified forms of the same sequence (FIG. 2F, Eq. 5). These characteristic mass relationships have been incorporated into the Link-Finder program to identify DSSO cross-linked peptides.

Characterization of DSSO Cross-Linked Model Peptides by MS^(n) Analysis—

To characterize the new DSSO linker, the inventors have first cross-linked several model peptides including Ac-IR7, Ac-myelin, and substance P. Under the experimental conditions, the major cross-linked products for Ac-IR7 and Ac-myelin are inter-linked, whereas substance P mostly formed dead-end modified peptides. All of the cross-linked model peptides were subjected to LC MS^(n) analysis. The inter-linked Ac-IR7 peptide (α-α) was detected as doubly charged (m/z 923.46²⁺) and triply charged (m/z 615.97³⁺) ions (FIG. 3A). MS/MS analyses of the two differently charged parent ions resulted in two dominant fragment ions respectively (FIGS. 3B-C). Since the two inter-linked sequences are identical, only one fragment pair (i.e. α_(A)/α_(S)) was observed as expected. The results suggest that MS/MS fragmentation of inter-linked peptides is independent of peptide charges. It should be noted that besides unique mass relationships, the fragment ions in each pair have a defined charge relationship associated to the charge of the parent ion. In other words, the sum of the observed charges for each fragment in a pair equals the charge of the parent ion. For example, the triply charged parent ion (m/z 615.97³⁺) generated the fragment pair with one doubly charged (α_(A) ^(2±)) and one singly charged (α_(S) ¹⁺) ion, whereas the doubly charge parent ion (m/z 923.46²⁺) only produced a fragment pair with two singly charged (α_(A) ¹⁺ and α_(S) ¹⁺) ions. This information can be used to validate the fragment pairs identified by masses. The respective MS³ analysis of α_(A) and α_(S) ions (FIGS. 3D-E) allowed unambiguous identification of the peptide sequence and cross-linked site based on a series of y and b ions. Similar analysis was carried out for inter-linked Ac-myelin (β-β), and a characteristic fragment pair was observed in MS/MS spectra of the parent ion (β-β at three different charge states (m/z 458.23⁶⁺, 549.68⁵⁺, 686.84⁴⁺) respectively (FIGS. 3F-I), which represent the expected fragmentation of two identical inter-linked peptides. While the fragment pair β_(A)/β_(S) was detected in MS/MS spectra of quintuply and quadruply charged inter-linked Ac-myelin (β-β) (m/z 549.68⁵⁺, 686.84⁴⁺) (FIGS. 2H-I), the fragment pair β_(A)/β_(T) was observed in the MS/MS spectrum of sextuply charged inter-linked Ac-myelin (β-β) (458.23⁶⁺) (FIG. 3G). The β_(T) fragment, namely the β peptide fragment containing an unsaturated thiol (T) moiety (+86 Da), was generated due to the loss of H₂O from the sulfenic acid moiety on the β_(S) fragment (FIG. 2E). This is likely due to excess collision energy deposited on the highest charged species as the collision energy chosen for CID analysis in LTQ-Orbitrap XL MS does not change with peptide charges during LC MS^(n) runs.

In addition to inter-linked peptides, dead-end modified peptides were analyzed. FIG. 2J displays the MS spectrum of the dead-end (DN) modified substance P (γ_(DN), m/z 538.76²⁺). As predicted in FIG. 2D, MS/MS analysis of γ_(DN) led to two major fragments, the alkene (γA, m/z 478.03²⁺) and sulfenic acid (γ_(S), m/z 502.95²⁺) containing peptide fragments, representing the characteristic feature of dead-end modified peptides. The fragment ions carry the same charge state as the parent ion, and MS³ analysis of the γ_(A) fragment confirmed its sequence unambiguously (FIG. 3L). Taken together, the results clearly demonstrate that the new MS-cleavable bonds in DSSO are labile and can be preferentially fragmented prior to peptide bond breakage, and the desired fragmentation is independent of peptide charge states and sequences.

Characterization of DSSO Cross-Linked Peptides of Model Proteins by MS^(n) Analysis—

The inventors next evaluated the applicability of DSSO for protein cross-linking under physiological conditions. Model proteins cytochrome c (see for previously described Sinz, A. (2003); Kasper, P. T., et al.; Nessen, M. A., et al.; Vellucci, D., et al.; Lee, Y. J., et al.; Pearson, K. M., Pannell, L. K., and Fales, H. M. (2002) Intramolecular Cross-Linking Experiments on Cytochrome C and Ribonuclease a Using an Isotope Multiplet Method. Rapid Commun. Mass Spectrom. 16, 149-159; Dihazi, G. H., and Sinz, A. (2003) Mapping Low-Resolution Three-Dimensional Protein Structures Using Chemical Cross-Linking and Fourier Transform Ion-Cyclotron Resonance Mass Spectrometry. 17, 2005-2014; and Guo, X., Bandyopadhyay, P., Schilling, B., Young, M. M., Fujii, N., Aynechi, T., Guy, R. K., Kuntz, I. D., and Gibson, B. W. (2008) Partial Acetylation of Lysine Residues Improves Intraprotein Cross-Linking. Anal Chem 80, 951-960) and ubiquitin (Chowdhury, S. M., et al.; and Gardner, M. W., et al.) have been extensively utilized to test various new cross-linking strategies since they have a relatively large number of lysine residues accessible for cross-linking. Based on our previous work (see Vellucci, D., et al.), cytochrome c was cross-linked with a 10-fold excess of DSSO. The cytochrome c cross-linking efficiency using DSSO was comparable to the efficiency using DSG or our previously developed Azide-DSG cross-linkers (see Vellucci, D., et al.), indicating that DSSO is as effective for protein cross-linking reactions. The DSSO cross-linked cytochrome c was then digested with trypsin and analyzed by LC MS^(n). Three types of cross-linked peptides of cytochrome c (i.e. inter-link, intra-link and dead-end) have been observed. FIG. 4A displays the MS/MS spectrum of a tryptic peptide of cytochrome c with m/z 419.9716⁴⁺, in which only four abundant fragment ions (m/z 336.42²⁺, 352.40²⁺, 478.99²⁺, 494.96²⁺) were detected, suggesting this peptide as a potential heterodimeric inter-linked peptide (α-β). Two possible fragment pairs, α_(A)/β_(S)/_(T) and α_(S/T)/β_(A) are thus expected, in which S/T means either S (sulfenic) or T (unsaturated thiol) containing fragment ions will be observed. Using the mass relationship between the pairs and the parent ion of inter-linked peptides (Eqs. 1, 2, 5 in FIG. 2F), the inventors identified two fragment pairs as α_(A)/β_(T) (478.99²⁺/352.40²⁺) and α_(T)/β_(A) (494.96²⁺/336.42²⁺), confirming that this peptide is a heterodimeric inter-linked peptide (a-0). Mass mapping of the parent ion (m/z 419.9716⁴⁺) by MS-Bridge revealed that it matches to an inter-linked peptide [Ac-GDVEKGKK (SEQ ID NO: 11) inter-linked to KKGER (SEQ ID NO: 13)] with an error of 0.48 ppm. The fragment ions α_(A) (m/z 478.99²⁺) and β_(T) (m/z 352.40²⁺) were further subjected to MS³ sequencing and their MS³ spectra are illustrated in FIGS. 4B-C. Based on the series of y (i.e. y₁₋₇) and b (i.e. b₂₋₇) ions, the sequence of the MS/MS fragment ion α_(A) (m/z 478.99²⁺) was unambiguously identified as Ac-GDVEK_(A)GKK (SEQ ID NO: 12), in which K (Lys) at 5th position from N-terminus was determined to be modified with the alkene moiety. MS³ analysis of the corresponding fragment pair ion β_(T) (m/z 352.40²⁺) determined its sequence as K_(T)KGER (SEQ ID NO: 14). Although there are two lysine residues in the sequence, occurrence of y₄ and a₁ ions indicates that the first N-terminal K is modified with an unsaturated thiol moiety. Taken together, the identity and cross-linking site of the inter-link peptide [Ac-GDVEKGKK (SEQ ID NO: 11) inter-linked to KKGER (SEQ ID NO: 13)] was determined unambiguously.

FIGS. 5A-C display MS/MS spectra of triply (m/z 641.6730³⁺), quadruply (m/z 481.5069⁴⁺), and quintuply (m/z 385.4070⁵⁺) charged ions of a cytochrome c cross-linked peptide. The MS/MS spectrum of the triply charged ion (m/z 641.6730³⁺) resulted in four dominant fragment ions (m/z 386.24, 418.21, 744.40²⁺, 760.38²⁺), which have been determined as the two fragment pairs α_(A)/β_(T) (744.40²⁺/418.21) and α_(T)/β_(A) (760.38²⁺/386.24), indicating this peptide is a heterodimeric inter-linked peptide. The same characteristic fragment pairs, i.e. α_(A)/β_(T) and α_(T)/β_(A) have also been identified but with different charges in the MS/MS spectra of the quadruply (m/z 481.5069⁴⁺) and quintuply (m/z 385.4070⁵⁺) charged parent ions respectively (FIGS. 5B-C). It is noted that some charge distribution of fragment ions was observed in the pairs (FIG. 5C) due to the high charge state of the parent ion. Nevertheless, the dominant ions are the characteristic fragment ions of the inter-linked peptide. MS³ analysis of the α_(A) (m/z 496.60³⁺) fragment has revealed its sequence identity unambiguously as HK_(A)TGPNLHGLFGR (SEQ ID NO: 17), in which the K (Lys) at position 2 from N-terminus was modified with the alkene moiety (FIG. 5D). In combination with the MS-Bridge result, the inter-linked peptide is identified as [HKTGPNLHGLFGR (SEQ ID NO: 16) inter-linked to GKK]. These results demonstrate that preferred fragmentation of the C—S bonds in DSSO inter-linked peptides of cytochrome c occurs as expected and is independent of peptide charge states and sequences.

To understand how dead-end modified peptides of cytochrome c behave in MS' analysis, FIG. 6A illustrates the MS/MS spectrum of a selected dead-end modified peptide (m/z 880.8975²⁺). As shown, two major fragment ions (m/z 820.20²⁺ and 835.88²⁺) were detected and they are 122 and 90 Da less than the parent ion respectively. Such mass differences between the parent ion and its fragment ions fit well with those predicted for DSSO dead-end modified peptides (eq. 3 in FIG. 2F), identifying the ion m/z 820.20²⁺ as α_(A) and 835.88²⁺ as α_(T) fragment. MS³ analysis of the α_(A) fragment (m/z 820.20²⁺) (FIG. 6B) as well as the MS-Bridge result of the parent ion (m/z 880.8975²⁺) identified its sequence as K_(DN)TGQAPGFSYTDANK (SEQ ID NO: 20).

As discussed above (FIG. 2D), the inventors predict that MS/MS analysis of the intra-linked peptide (α_(intra)) will lead to either a fragment ion (α_(A+S)) containing one K_(A) (Lys_(A)) and one K_(S) (Lys_(S)) with the same mass as the parent ion or a fragment ion (α_(A+T)) containing one K_(A) (Lys_(A)) and one K_(S) (Lys_(T)) with a mass 18 Da less than the original parent ion. FIG. 6C displays the MS/MS spectrum of a cytochrome c tryptic peptide with m/z 611.9802³⁺ in which only one major fragment ion (m/z 606.24²⁺) was detected with a mass 18 Da less than the parent ion. This suggests that the peptide is potentially an intra-linked peptide of cytochrome c and its MS/MS fragment ion (m/z 606.24²⁺) can be labeled as α_(A+T). Mass mapping of the parent ion m/z 611.9802³⁺ using MS-Bridge matched to an intra-linked peptide, GGK*HK*TGPNLHGLFGR (SEQ ID NO: 24), where the two N-terminal K* (Lys*) are linked. Since the CID-induced C—S bond breakage can occur at either side of the sulfoxide, a mixture of two fragments with identical masses but with alkene (A) or thiol (T) moieties at either K can be generated. FIG. 6D illustrates the MS³ spectrum of the MS/MS fragment ion (m/z 606.24³⁺), with a series of y and b ions confirming its identity as GGK_(T)HK_(A)TGPNLHGLFGR (SEQ ID NO: 26) and/or GGK_(A)HK_(T)TGPNLHGLFGR (SEQ ID NO: 25). The detection of y₁₃ (760.43²⁺), and b₃ (297.34) ions indicates the presence of the peptide fragments from the sequence of GGK_(T)HK_(A)TGPNLHGLFGR (SEQ ID NO: 26), and the detection of b₃ ^(*) (329.37), b₄ ^(*) (466.33), y₁₂ ^(*) (692.10²⁺), and y₁₃ ^(*) (744.51²⁺) identified the peptide fragments from the GGK_(A)HK_(T)TGPNLHGLFGR (SEQ ID NO: 25) sequence.

Development of an Integrated Workflow for Fast and Accurate Identification of DSSO Cross-Linked Peptides by LC MS″—

In order to facilitate data analysis for the identification of DSSO cross-linked peptides from complex mixtures, the inventors have developed an integrated workflow for processing LC MS^(n) data acquired by LTQ-Orbitrap XL MS (FIG. 7A). During LC MS^(n) analysis, three types of data are collected, i.e. MS, MS/MS and MS³ spectra, in which MS and MS/MS are acquired in FT mode to allow accurate mass measurement and charge determination of both parent ions in MS and their fragment ions in MS/MS spectra. MS³ is obtained in LTQ to achieve the highest sensitivity. As shown, the first data extraction step is to generate the text files containing peak lists of MS/MS and MS³ data respectively. Based on the unique MS/MS fragmentation profiles of DSSO cross-linked peptides and the defined mass relationships between parent ions and their fragment ions (FIG. 2), Link-Finder program was developed to automatically search MS/MS data to identify putative DSSO cross-linked peptides (FIG. 7B). As discussed above, the inter-linked products produce distinct MS/MS spectra with two pairs of dominant peptide fragments (α_(A)/β_(S/T) and α_(T/S)/β_(A)). For each MS/MS scan, among the top eight most abundant peaks, if there is a fragment pair with a mass sum equal to their parent mass with or without a water loss (−18 Da), the parent ion will be categorized as a possible inter-linked peptide. If two of those pairs can be found, and the mass difference between any two fragments from the two distinct pairs is 32 Da, i.e., the mass difference between the thiol and alkene moieties, then it is almost certain that the parent ion is a true inter-linked product. The dead-end product typically has two major fragment ions representing the parent peptide attached with either a thiol or an alkene moiety. Among the top three peaks, if there are two peaks with mass difference of 32 Da, and one of them is 90 Da less than the parent mass, then it is categorized as a possible dead-end peptide. Using the Link-Finder program, a list of parent ions are identified as putative inter-linked or dead-end modified peptides. The generated list of parent ion masses is then subjected to MS-Bridge to identify putative cross-linked peptides of all types by mass matching with high mass accuracy (<10 ppm).

For MS³ data, only the original parent ion observed in MS scan is listed as the precursor ion during database searching. In order to extract the MS³ parent ion (fragment ions in MS/MS), for Batch-Tag search, the second data extraction step is carried out using in-house scripts to generate a modified MS³-txt file. The Batch-Tag search result provides high confidence identification of single peptide fragments generated in MS/MS that are initially cross-linked. Finally, the results from three different types of searches, i.e. Batch-Tag (MS³ data), Link-Finder (MS/MS data), and MS-Bridge (MS data) are integrated using in-house scripts within Link-Finder program to obtain accurate and reliable identification of cross-linked peptides. Among them, MS³ sequencing with Batch-Tag searching is essential for unambiguous identification of cross-linking sites.

Identification of DSSO Cross-Linked Peptides of Model Proteins by Automated Database Searching—

The newly developed integrated workflow was first employed to identify DSSO cross-linked peptides of cytochrome c. In total, 19 inter-linked peptides have been unambiguously identified and summarized in TABLE 1 (for details see TABLE 3 and FIG. 11). Each peptide has characteristic fragment pairs in MS/MS spectra and was identified by Link-Finder program. In addition, one or two MS/MS fragment pair ions have been sequenced by MS³ to provide unambiguous identification. Moreover, all of the parent masses fit well with identified cross-linked peptides by MS-Bridge program with high mass accuracy. In comparison to reported cross-linking studies of cytochrome c (Schilling, B., et al.; Kasper, P. T. et a/.; Nessen, M. A. et al.; Vellucci, D. et al.; Lee, Y. J., et al.; Pearson, K. M., et al.; Dihazi, G. H.; and Guo, X., et al.), three novel inter-links have been identified in this work. Besides the inter-linked peptides, 7 intra-linked and 8 dead-end peptides have also been identified (See TABLE 3). For the dead-end modified peptides, each has a dead-end fragment pair and at least one of the fragment ions has been sequenced, which correlates very well with MS-Bridge and Batch-Tag results. The intra-linked peptides were mainly identified by Batch-Tag and MS-Bridge results.

In addition to products with one cross-link (i.e. type 0, 1 and 2), peptides containing two cross-links have also been identified using this integrated workflow. In this work, 11 non-redundant DSSO cross-linked peptides with two links (e.g. one inter-link with one dead-end, one inter-link with one intra-link, or one intra-link with one dead-end) have been identified and summarized in TABLE 3. This type of information is not commonly reported since peptide sequencing of multi-linked peptides is highly complicated. This demonstrates the ability of our new cross-linking strategy for identifying such complex products.

Based on the crystal structure of bovine heart cytochrome c (PDB ID; 2B4Z) (44), the inventors have calculated the distances between alpha carbons of the identified cross-linked lysine residues (TABLE 1 and TABLE 3). Among the 26 non-redundant inter-linked lysines in cytochrome c identified in this work (excluding linkages between two adjacent lysines), all of the linkages have the distances between their alpha carbons within the range of 5.3 Å to 19.3 Å. This is consistent not only with the length of a fully expanded DSSO (10.1 Å spacer length) and two lysine side chains, but also with the previous results using similar lengths of NHS ester cross-linkers (see Vellucci, D., et al.; Lee, Y. J., et al.; Guo, X., et al.; and Kruppa, G. H., Schoeniger, J., and Young, M. M. (2003) A Top Down Approach to Protein Structural Studies Using Chemical Cross-Linking and Fourier Transform Mass Spectrometry. Rapid Commun Mass Spectrom 17, 155-162). The results suggest that our cross-linking conditions did not induce significant disturbance to cytochrome c structural conformations.

In addition to cytochrome c, the same strategy has been successfully applied to identify DSSO cross-linked peptides of ubiquitin. Using the same analysis strategy, 3 inter-linked, 1 intra-linked, and 5 dead-end peptides have been identified as summarized in TABLE 4 and FIG. 11. Based on the crystal structure of bovine ubiquitin (PDB ID; 1AAR), all of the identified inter-/intra-linked lysines in ubiquitin have the distances between their alpha carbons within the range of 6 to 18 Å. The identified cross-linked lysines are consistent with the known structure of ubiquitin and previous reports (Chowdhury, S. M., et al.; and Gardner, M. W., et al.) It is interesting to note that one of the identified inter-linked peptides is [LIFAGK⁴⁸QLEDGR (SEQ ID NO: 63) inter-linked to LIFAGK⁴⁸QLEDGR (SEQ ID NO: 63)], which is a cross-link formed between the ubiquitin dimer. Residue K⁴⁸ is located at a hydrophobic patch important for protein interactions and K⁴⁸ is also an in vivo chain linkage site for polyubiquitination required for ubiquitin/ATP dependent proteasomal degradation (Pickart, C. M., and Cohen, R. E. (2004) Proteasomes and Their Kin: Proteases in the Machine Age. Nat Rev Mol Cell Biol. 5, 177-187). The same K⁴⁸-K⁴⁸ (Ly⁴⁸-Lys⁴⁸) cross-link was identified previously using an alkyne-tagged NHS ester, but only after selective enrichment coupled with CID and ETD analyses (Chowdhury, S. M., et al.). In comparison, the inventors were able to identify the K⁴⁸ inter-linked peptide without any enrichment, thus further demonstrating the effectiveness of our approach to identify DSSO cross-linked peptides from complex mixtures.

Structural Elucidation of the Yeast 20 S Proteasome Complex Using DSSO Cross-Linking—

The ubiquitin-proteasome degradation pathway plays an important role in regulating many biological processes (Pickart, C. M., et al.) The 26 S proteasome complex is the macromolecular machine responsible for ubiquitin/ATP dependent protein degradation, and it is composed of two subcomplexes: the 20S core particle and the 19 S regulatory complex. To date, only the crystal structure of the 20 S proteasome complex has been resolved. However, structures of the 19 S and 26 S remain elusive, thus hindering the understanding of the structure and functional relationship of the 26 S proteasome complex. To develop an effective cross-linking strategy to elucidate structures of the 19 S and 26 S proteasome complexes, have therefore investigated the structure of the yeast 20 S proteasome complex using the DSSO cross-linking approach. The cross-linking of the 20 S proteasome complex was carried out in PBS buffer under conditions allowing efficient cross-linking of all subunits as based on 1-D SDS-PAGE (FIG. 12). The tryptic digest of the cross-linked proteasome complex was subjected to LC MS' analysis and the data were analyzed using the integrated work flow described above (FIG. 7). In total, 13 unique inter-linked peptides were identified including 10 intra-subunit and 3 inter-subunit heterodimeric inter-links as summarized in TABLE 2 (for details see TABLE 5), which were determined unambiguously by integration of Link-Finder, Batch-Tag (MS³ sequencing, see FIG. 13), and MS-Bridge (mass mapping of the cross-linked peptides) results. As an example, FIG. 8A displays the MS/MS spectrum of a DSSO heterodimeric inter-linked peptide α-β (m/z 833.9231⁴⁺) of the yeast 20 S proteasome complex, in which two fragment pairs were detected and determined as α_(A)/β_(T) (868.45²⁺/790.39²⁺) and α_(T)/β_(T) (884.44²⁺/774.41²⁺). MS³ analysis of the α_(A) fragment (m/z 868.45²⁺) identified the a chain unambiguously as NK_(A)PELYQIDYLGTK (SEQ ID NO: 28), which matched to 20 S subunit β4. In this sequence, K_(A) is modified with the alkene moiety. In addition, MS³ analysis of the β_(T) fragment (m/z 790.39²⁺) identified the β chain unambiguously as LGSQSLGVSNK_(T)FEK (SEQ ID NO: 30), which matched to 20 S subunit β3. Here, K_(T) is modified with an unsaturated thiol moiety. Mass mapping by MS-Bridge further confirmed this inter-subunit (β4-β3) inter-linked peptide as [NKPELYQIDYLGTK (SEQ ID NO: 27) inter-linked to LGSQSLGVSNKFEK (SEQ ID NO: 29)].

In addition, 21 dead-end modified peptides were identified by multiple lines of evidence as illustrated in TABLE 5. The fragmentation behavior for the dead-end modified peptides of the 20 S subunits is the same as that of cytochrome c showing two distinct dead-end pairs in MS/MS spectra. This is illustrated with an example shown in FIG. 14.

The experimentally determined structure of the yeast 20 S proteasome holocomplex was utilized (Protein Data Bank code 1RYP) to assess the cross-linked lysine pairs identified in this study. For each identified cross-link the distance between the alpha carbons was calculated and the results are summarized in TABLE 2. Considering the spacer length of DSSO and lysine side chains, the theoretical upper limit for the distance between the alpha carbon atoms of paired lysines is approximately 26 Å. The inventors' reported distances are within this upper limit, providing some evidence that the proteasome cross-links are formed in the native state. The quaternary proteasome structure is formed by four stacked seven-member rings in the order αββα. The side view and basal view of the arrangement among one set of the symmetric αβ rings and their subunits are shown in FIG. 9. The alpha carbon trace is shown for all subunits and the cross-linked lysines are shown in space fill representation. Lysines forming intra-subunit cross-links appear in blue and those forming inter-subunit cross-links appear in red. The images in FIG. 9 were generated using UCSF Chimera visualization software (Pettersen, E., Goddard, T., Huang, C., Couch, G., Greenblatt, D., Meng, E., and Ferrin, T. (2004) Ucsf Chimera—a Visualization System for Exploratory Research and Analysis. Journal of computational chemistry 25, 1605-1612).

DISCUSSION

The inventors have presented a novel cross-linking strategy for structural analysis of model proteins and the yeast 20 S proteasome complex by combining a newly designed MS-cleavable cross-linker DSSO with an integrated data analysis workflow. As noted above, while this discussion has centered around DSSO (shown as Compound 1 in FIG. 1), other compounds having the General Structure 2, such as Compounds 3-6 can also be used. This approach is effective and facilitates fast and accurate identification of DSSO cross-linked peptides by LC MS^(n). The new MS-cleavable cross-linker DSSO is attractive for cross-linking studies of protein complexes for a number of reasons: 1) it can be easily synthesized and can cross-link protein complexes effectively at sub-micromolar concentrations (˜1 μM); 2) it has two symmetric CID labile C—S bonds that preferentially fragment prior to peptide backbone breakage; 3) the CID-induced cleavage of inter-linked peptides is specific and independent of peptide charges and sequences; 4) DSSO cross-linked peptides can generate characteristic fragmentation patterns in MS/MS spectra that are unique to different types of cross-linked peptides for easy identification; 5) there are unique mass and charge relationships between MS/MS peptide fragment ions and their parent ions, permitting automated data processing. In comparison to existing MS-cleavable cross-linkers (Tang, X., et al.; Zhang, H., et al.; Soderblom, E. J., and Goshe, M. B. et al.; Soderblom, E. J., Bobay, B. G., et al.; and Gardner, M. W., et al.), the DSSO cross-linker can provide a specific and selective fragmentation of cross-linked peptides for identification. The fragmentation patterns of DSSO cross-linked peptides are similar to those of “fixed charge” sulfonium ion containing cross-linked model peptides developed by Lu, Y. et al. Although DSSO does not carry a fixed charge, our results have demonstrated that the preferential cleavage of C—S bond adjacent to the sulfoxide in DSSO is as effective as cleavage of the C—S bond in the sulfonium ion containing cross-linker (i.e. S-methyl 5,5′-thiodipentanoylhydroxysuccinimide) (Lu, Y. et al.). However, fragmentation of the sulfonium ion containing cross-linked peptide requires the formation of a five-membered ring with the sulfonium ion and the amide of the linker such that it is not feasible to change spacer lengths in these cross-linkers. In contrast, the simple fragmentation mechanism gives DSSO the flexibility of changing its spacer lengths to accommodate cross-linking lysines at different distances while maintaining the symmetry of the linker with easily interpretable fragmentation patterns. In addition, DSSO has better potential for studying protein interactions by in vivo cross-linking. It is well known that cross-linking study of protein complexes is extremely challenging due to the inherent limitations of current cross-linkers. With the improvement on database searching of non-cleavable inter-linked peptides, it is possible to identify cross-linked peptides of protein complexes using non-cleavable cross-linkers (Maiolica, A., et al.; and Chen, Z. A. et al.). However, this requires a special program for data interpretation and the false positive rate of identifying inter-linked sequences is higher than that of identifying single sequences. Here the inventors have demonstrated the feasibility of using novel DSSO cross-linking strategy to study the structure of the yeast 20S proteasome complex. This work represents a major advancement in structural elucidation of multi-subunit protein complexes with improved data analysis and accuracy as such application of MS-cleavable cross-linkers has not been reported before.

In addition to the design of this novel MS-cleavable linker, the inventors have developed an integrated data analysis workflow to achieve fast, easy and accurate identification of cross-linked peptides and the cross-linking sites. Identification of DSSO cross-linked peptides from complex mixtures has been accomplished with high confidence by integrating data analyses of three different datasets, MS, MS/MS and MS³ data. Due to the difficulty in interpreting MS/MS spectra of unseparated inter-linked peptides, many of previously reported inter-linked products were determined only based on parent masses. In contrast, all of the inter-linked peptides of cytochrome c, ubiquitin and the yeast 20 S proteasome complex have been identified in this work with three lines of evidence including characteristic fragmentation pairs (Link-Finder), peptide sequence determination by MS³ sequencing (Batch-Tag), and mass mapping (MS-Bridge). This procedure permits the identification of cross-linked peptides with high accuracy, reliability and speed. It is important to note that existing database search programs can be easily adapted for analyzing DSSO cross-linked peptides, thus a broad application of the DSSO-based cross-linking strategy is foreseeable. Furthermore, cross-linked peptides of cytochrome c with two links can be identified, suggesting the capability of the new cross-linking strategy for identifying more complex cross-linked products.

Cross-linking/mass spectrometry has been previously attempted to study the yeast 20S proteasome complex using Ru(II)(bpy)^(2+/3) tris(2,2′-ipyridyl)ruthenium (II) dication)/ammonium persulfate/light-mediated cross-linking (Denison, C., and Kodadek, T. (2004) Toward a General Chemical Method for Rapidly Mapping Multi-Protein Complexes. J Proteome Res 3, 417-425), in which multiple subunit interconnectivity has been determined based on MS identification of co-migrated subunits by SDS-PAGE after cross-linking. No cross-linked peptides were identified due to complicated chemistry of the radical based cross-linking reaction. Therefore the inventors' work describes the first successful use of a cross-linking/mass spectrometry strategy to determine inter-subunit and intra-subunit interaction interfaces of the yeast 20 S proteasome complex. Although only 13 inter-linked peptides of the yeast 20 S proteasome have been identified and reported here, this work presents the first step toward full characterization of proteasome structures using cross-linking/mass spectrometry in the future. The feasibility of using the DSSO-based cross-linking strategy to identify cross-linked peptides of a large protein complex at 1 μM or less concentration is very significant and of great promise to structural studies of protein complexes since purifying protein complexes at high concentrations is technically challenging.

During LC MS^(n) analysis using LTQ-Orbitrap XL MS, collision energy cannot be adjusted on the fly to account for differences in peptide charge states, therefore compromised collision energy is set during the entire LC MS^(n) run. Thus there exists a possibility that the collision energy may be too high for the highly charged ions while too low for peptides with lower charges. Future improvement on charge selection and energy adjustment during LC MS^(n) data acquisition may be needed to further enhance the quality of the results. Additionally, optimized peptide separation prior to LC MS^(n) analysis will be necessary to improve the dynamic range of peptide analysis and allow the detection of low abundance cross-linked peptides. Moreover, refinement of the Link-Finder program is needed to improve the identification of intra-linked peptides. Lastly, the addition of an affinity tag to the sulfoxide containing cross-linker will improve detection of cross-linked peptides, which will be the subject of our future study.

In summary, the inventors have developed a new MS-cleavable cross-linker family of compounds, including DSSO that are applicable for model peptides, proteins and a multi-subunit protein complex. The unique MS features of DSSO cross-linked peptides together with our integrated data analysis workflow for analyzing LC MS^(n) data greatly reduce the time spent identifying cross-linked peptides. Given its simplicity, speed and accuracy, the inventors believe that this cross-linking strategy will have a broad application in elucidating structures of proteins and protein complexes in the future.

Although embodiments of the present invention have been described in detail herein in connection with certain exemplary embodiments, it will be understood that the invention is not limited to the disclosed exemplary embodiments, but, on the contrary, is intended to cover various modifications and equivalent arrangements. The invention is limited only by the appended claims and their equivalents. 

What is claimed is:
 1. A MS-cleavable cross-linker for proteins and protein complexes, the crosslinker having two symmetric collision-induced dissociation (CID) cleavable sites and the formula:

where X is selected from the group consisting of

wherein R is methyl or ethyl, and


2. A MS-cleavable cross-linker as recited in claim 1, having the structure:


3. A method for mapping protein-protein interactions of protein complexes, comprising: providing a MS-cleavable cross-linker as recited in claim 1; forming a cross-linked protein complex by cross-linking proteins with the MS-cleavable cross-linker; forming protein and/or peptide fragments that are chemically bound to the MS-cleavable cross-linker by digesting the cross-linked protein complex with an enzyme such as trypsin; and using mass spectrometry (MS) and MS^(n) to identify the protein and/or peptide fragments.
 4. A method for integrated data analysis workflow for identification of cross-linked peptides, comprising: providing cross-linked peptides, each cross-linked peptide comprising an MS-cleavable cross-linker; performing mass spectrometry on the cross-linked peptides to obtain MS data, MS/MS data, and MS³ data; identifying the MS/MS data comprising characteristic fragmentation profiles of MS-cleavable cross-linked peptides to obtain an MS/MS result comprising a list of parent ions corresponding to cross-linked peptide candidates; mass mapping the MS data using the list of parent ions corresponding to the cross-linked peptide candidates and the MS-cleavable cross-linker against known protein sequences to obtain an MS result; peptide sequencing the cross-linked peptides using the MS³ data to obtain an MS³ result; and integrating the MS result, the MS/MS result, and MS³ result to identify at least one of the cross-linked peptides.
 5. The method for integrated data analysis workflow for identification of cross-linked peptides of claim 4, wherein the MS-cleavable cross-linker is DSSO.
 6. The method for integrated data analysis workflow for identification of cross-linked peptides of claim 4, wherein the MS data is obtained in fourier transform (FT) mode.
 7. The method for integrated data analysis workflow for identification of cross-linked peptides of claim 4, wherein the MS/MS data is obtained in fourier transform (FT) mode.
 8. The method for integrated data analysis workflow for identification of cross-linked peptides of claim 4, wherein the MS³ data is obtained using a linear trap quadrupole (LTQ).
 9. The method for integrated data analysis workflow for identification of cross-linked peptides of claim 4, wherein the comparing the MS/MS data is carried out using Link-Finder.
 10. The method for integrated data analysis workflow for identification of cross-linked peptides of claim 4, wherein the mass mapping is carried out using Protein Prospector MS-Bridge.
 11. The method for integrated data analysis workflow for identification of cross-linked peptides of claim 4, wherein the peptide sequencing is carried out using Protein Prospector Batch-Tag.
 12. The method for integrated data analysis workflow for identification of cross-linked peptides of claim 4, further comprising reformatting the MS³ data such that data from MS³ fragment ions is linked to data from MS/MS parent ions.
 13. The method for integrated data analysis workflow for identification of cross-linked peptides of claim 4, wherein the performing mass spectrometry on the cross-linked peptides to obtain the MS data, the MS/MS data, and the MS³ data comprises: obtaining an MS spectrum; obtaining an MS/MS spectrum; obtaining an MS³ spectrum; extracting the MS data from the MS spectrum; extracting the MS/MS data from MS/MS spectrum; and extracting the MS³ data from the MS³ spectrum.
 14. The method for integrated data analysis workflow for identification of cross-linked peptides of claim 4, wherein the MS-cleavable cross-linker has the formula:

where X is selected from the group consisting of

wherein R is methyl or ethyl, and 